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ABSTRACT 



This thesis presents a design for a paging system that may be used to 
implement a virtual memory on a large scale, demand paged computer 
utility. A model for such a computer system with a multi-level, 
hierarchical memory system is presented. The functional requirements of a 
paging system for such a model a«# discussed, with emphasis on the 
parallelism inherent in the algorithms used to implement the memory 
management functions. 

A complete, multi-process design is presented for the model system. 
The design incorporates two system processes, each of which manages one 
level of the multi-level memory, being responsible for the paging system 
functions for that memory. These processes may execute in parallel with 
each other and with user processes. The multi-process design is shown to 
have significant advantages over conventional designs in terms of 
simplicity, modularity, system aecuifty", .,, ; "aial r " system^ 3 growth and 
adaptability. An actual test implementation on the iluitics system was 
carried out to validate the proposed design. 
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CHAPTER 1 



Introduction 



This thesis will examine a general multiple process design of a 
paging system. Such a design could be used in th» implementation of a 
demand paged memory in any suitable computer operating system. As 
computer systems have grown in size, the operating systems have also 
greatly increased in size, scope, and complexity, especially so-called 
computer utilities and large time shared systems. The design presented 
here offers a method for simplifying one large component of such ays terns: 
the memory management task. The resulting system is less complex yet 
readily expandable to accomodate future systems growth. 

There are two central concepts underlying the design presented in the 
following chapters. These are the concept of at process as an abstraction 
of a program in execution, and the concept of paging as a means of 
implementing a virtual memory. Before the motivation for designing a 
paging system as cooperating processes can be discussed, these two 
concepts warrant closer examination. 



1.1 Processes 

The essence of a process is the execution of a program. Numerous 
definitions of a process are given by various authors [Da68] [Ha70] 
[Di68a] but all include the notion of an execution point passing through 
the instructions of some program. Thus a process is an abstraction of the 
locus of control that passes through an executing procedure [De66] . 
The address space of a process, that is, the set of all memory 
addresses the process may reference, is an important component of a 
process. In fact, the address space of a process influences the 
computations the process can carry out to such an extent that we include 
the address space in our definition of a process. A process, then, 
consists of a pair: an execution point, or locus of control, and an 
address space. 

The process abstraction provides a natural way of describing an 
operating system. Each user's work is viewed as a process, i.e. a task to 
be performed. The operating system itself is seen as a task or process 
manager. The various facilities the operating system provides, such as 
memory or device management, can themselves be implemented as processes. 
Two good examples of systems designed around the process concept in this 
manner are Dijkstra's THE system [Di68b] and a multiprogramming system 
described by Hansen in [Ha70] . 

In any multi-processor computer system, processes offer a 
straightforward technique for achieving multi-processing (the simultaneous 
execution of two or more programs) . Any physical processor (CPU) in the 
system can execute any user or system process. This permits the operating 



system to be multi-processed, i.e. different functions of the operating 
system may be executed in parallel. Parallel execution of the operating 
system, or one component of the operating system (the paging system) is a 
central theme in this thesis. 

1.2 Paged Systems 

Paging is a common strategy for solving the memory allocation 
problem, one of the chief tasks any operating system must perform. 
Examples of systems using paged memories include Hultics [Da68] , TENEX 
[Mu72], and IBM's VS systems [Wh74] .[Sc73] . 

In a paged system the address space of a process is divided into 
contiguous pieces of fixed size called pages. Physical memory is 
partitioned in the same manner into contiguous blocks called page frames. 
When allocating memory to a process, any available page frame may be 
allocated to hold any page. 

Usually the memory of a large computer utility is organized into 
several physical levels LI, L2, ... Ln. The access time and capacity of a 
level increases with n, and each level is normally a different type of 
memory device. For such devices, the smaller the access time the higher 
the cost per bit and therefore the smaller the capacity. By combining 
such components with widely varying speeds and size into a multi-level 
memory an overall memory system can be constructed whose capacity equals 
that of its largest component yet whose effective speed approaches that of 
its fastest component. 
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In such multi-level memories a process may reference only pages 
residing in the primary (level 0) memory. Referencing a page not 
allocated a page frame at the lowest level results in a page fault, an 
event which causes the necessary operating system mechanisms to be invoked 
to allocate a level page frame to the page and cause the page to be read 
into that page frame. The operating system modules and the data bases 
these modules use to perform this task are called the paging system, or 
page control. Page control is a resource manager; page frames being the 
resource page control manages. 

1.3 Paging Systems as Processes 

There are many alternative methods for organizing and implementing 
the paging system functions. The most widely used is to have the user 
process itself perform the necessary memory management functions when 
needed, just as with any other system call. That is, the code that 
carries out the necessary operations to allocate page frames is executed 
in the user's address space just like a user program. 

This thesis will examine several ways for organizing paging systems 
as processes. The paging system can be broken down into several 
activities, for example, removing pages from primary memory when it 
becomes full to make room for other pages. In such a system, each 
activity of the paging system can be made a separate process, with its own 
address space. Thus the paging system becomes a set of cooperating 
sequential processes, running in parallel and asynchronously. Such 
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systems will be called multi-process paging systems, and this thesis will 
argue that such systems offer significant advantages in simplicity, 
modularity, security and expandability over more conventional designs. 
The work described in this thesis differs from a multiple process 
paging system proposed by Hoare [Ho73] in the number of processes used and 
the function assigned to each. The model developed by Saxena and Bredt 
[Sa75] is closer to what is described here. However Saxena arid Bredt use 
a multi- level paging system that distinguishes usar page faults from page 
faults caused by system processes, a distinction found unnecessary in the 
design presented in Chapter 3. These differences and similarities are 
considered in more detail in section 3.3. 

1.4 Summary of Thesis 

The remainder of this thesis will examine *he design and 
implementation of paging systems for a large computer utility as several 
cooperating proceases. The Multics system wiM be used as a model of such 
a computer utility. Multics was chosen because it is typical of large, 
sophisticated time shared systems and incorporate* both of the 
prerequisite ideas already mentioned: a multi-level, demand paged memory, 
and processes. Therefore the basic concept* are already present and need 
not be added. 

Currently a major research effort is being made to engineer a 
security kernel for Multics {Sc75] . Redesigning the paging system 
contributes to the certification of such a.kernel by reducing both the 
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size and complexity of the code that must be verified. The original 
impetus for the work described in this thesis was the need for simplifying 
kernel mechanisms such as paging. 

Chapter 2 discusses the basics of paging systems in detail. The 
objects page control uses to implement a large demand paged virtual memory 
are examined. Functions which the paging system must provide to the rest 
of the operating system are listed and discussed. 

In Chapter 3 paging systems are classed into three groups based on 
their organization. User process paging systems, illustrated by Multics, 
are those where the paging functions are performed in the user's process. 
System process paging systems utilize special system processes to 
implement the paging functions. Combination paging systems, using 
features of both of the other two types, include designs appearing in the 
literature due to Hoare [Ho73] and Saxena and Bredt [Sa75] . The author's 
design for a combination multi-process paging system in presented, in 
which memory allocation is performed in the user's process but other page 
control functions are done in system processes. The significant 

advantages of both types of multi-process paging systems are considered at 
some length. 

A test implementation of the design on the Multics system is 
presented in Chapter 4, concentrating on the difficulties arising in an 
actual implementation and the insights gained from such an effort. The 
results of this test implementation are compared with the current 
implementation to see how well the goals of a multi-process implementation 
can be realized. 

Techniques for exploiting fully the parallelism available in a 

13 



multi-process paging system by eliminating global locking strategies are 
examined in Chapter 5. 

Chapter 6 concludes the thesis by summarizing the important results 
and drawing some final conclusions and observations. 

The three appendices present additional information on the 
implemented multi-process page control described in Chapter 4. Appendix A 
compares the design to the standard Multics page control. Appendix B 
lists the components of the implemented design, and Appendix C contains 
some of the actual PL/1 code from important portions of the design. 
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CHAPTER 2 



Basic Objects and Functions of Paging Systems 



In Chapter 1 the paging system, or page control, was loosely defined 
to be those procedures and data bases necessary to resolve page faults and 
provide the memory allocation task. This chapter will focus on exactly 
what functions and services page control fust provide to the rest of the 
system and what objects page control must implement in providing these 
functions. Such a description will help suggest how the parts of page 
control can best be divided along functional lines into several processes. 

Figure 2.1 illustrates the model of a memory system that will be 
assumed in the remainder of this thesis. The memory system is a 
hierarchical, multi-level memory consisting of three levels: 1. Primary 
memory, in which any data referenced by a processor must reside. 2. The 
paging device, or backing store (which need not be a single device) which 
acts as a large, high speed buffer between primary and secondary memory. 
3. Secondary storage, which provides long term storage of data and 
programs. For example, in such a system primary memory is often high 
speed core memory; the paging device is often a drum (or a bulk store 
device in the case of Multics) ; and disks and perhaps tape normally 
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Model of a multi-level hierarchical memory system 
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provide secondary storage. 

While the model shown incorporates three levels of memory, more or 
fewer are possible. The actual number of levels should not be crucial in 
a well designed system. Indeed, the design presented in Chapter 3 will be 
seen to adapt easily to a multi-level memory with any number of levels. 

Pages are moved from level to level by the paging system. It is 
assumed that a page may reside in any or all levels of the memory at any 
given time; however only one copy of the page may exist in each level. 
If multiple copies of a page do exist in the memory hierarchy, they may 
not all be identical. The most up to date version of a page will be the 
copy in primary memory (if there is one), then the paging device copy (if 
there is one) . 

2.1 Page Control Objects 

There are three objects of fundamental importance to page control: 
pages, the basic allocatable unit of virtual memory; page frames, the 
corresponding unit of physical memory; and address translation registers, 
which translate virtual addresses into absolute physical memory addresses. 



2.1.1 Pages 



In paged systems, the address space of a process is divided into 
units called pages, or sometimes virtual pages. A page is an abstraction 
of a portion of a process's address space, a set of consecutive virtual 
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addresses (hence the term "virtual page"). Procedures and data are both 
broken into sages, although this division into pages is invisible to the 
progranaer. 

The number of consecutive virtual addressee (locations) in a page is 
the page size. The page size is typically fixed at a power of two, and 
generally ranges from 128 to 4,096. The page size is usually determined 
by characteristics of the hardware in order -to^dpN^fiiia* performance of 
secondary memory. The virtual address spaieUfr of w process is restricted 
only by the? hardware's limits on the number- ©# pages the process may 
reference. 



2.1.2 Page Frames 

The physical counterpart of a page is a page f*saawu Juat a» the 
address space of a process is divided into pages, the physical memory in 
the system is broken into page frames. -& page fissile ts a coh&tgudua area 
of fixed size in some physical memory device. Each page frame can store a 
number of bits of information, namely the same' number of-- bl%i as in a page 
(which depend® upon the page size and the wo»# aiae) . 

Page frames are the raw memory resource of the system. The number of 
page frames is strictly limited by the capacities of the various devices 
in the memory system. Often it is useful to distinguish among the page 
frames of each level, hence the terms "paging device page frame" or "core 
page frame" may be used. 

Memory allocation is done by assigning page frames to hold the pages 
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needed by a process. A process may only reference pages which reside in a 
primary memory page frame. Since the number ot primacy memory page frames 
is quite small (on the order of hundreds) while the number of pages the 
processes in the system can address is much larger (by at least an order 
of magnitude) only a fraction of the pages can be in main memory at any 
time. The purpose of the paging system is to multiplex the page frames 
among the pages to give the appearance of a much larger primary memory. 

The paging system must keep track of the status of each page frame, 
whether allocated or available, at each level of the memory under its 
control. While there are many ways to organize the required information, 
we assume lists are used. There is nothing fundamental about using a list 
structure for this purpose, the choice is largely for convenience. Thus 
we assume that page control maintains two lists of page frames for each 
level of memory it manages (primary or core memory, and the paging device 
— secondary storage is assumed to be managed by the file system, see 
section 2.1.4.). These lists are a "used list" containing those page 
frames currently allocated, and a "free list" consisting of those page 
frames not currently allocated. We further identify each list by its 
level, hence there will be a "core used list" and a "core free list", and 
corresponding paging device free and used lists. Note page control may 
want to keep certain information about the page frames on these various 
lists. For example, for every frame on the core used list, page control 
will want to record the identity of the page using that frame. We assume 
the page frames in a list may be ordered in an arbitrary manner. (For 
example, the lists might be structured as linked lists.) The reason for 
wishing to order the lists is made clear in section 2.2.2. 



19 



these four Mats, along with the page tables described in the next 
section, are the fundamental data bases of page control, for they define 
the state of the memory. 

2.1.3 Address Translation Registers 

Since processes make references to virtual addresses of the form 
(page, word) while the physical processors executing the instructions of 
the process must reference real memory using physical addresses, there 
must be a mechanism for translating virtual addresses (references to 
virtual pages) to physical addresses (references to page frames) . This is 
done by associating with each virtual page an address translation 
register. The address translation register contains the address at which 
the contents of the virtual page may be found (i.e. the absolute address 
of the page frame bound to the page). All references to pages are made 
through the address translation registers. If the page has not been 
allocated a page frame a special tag indicates the fact and causes a 
special hardware fault when a reference is made to the address translation 
register. 

The address translation registers for all the pages in the address 
space of a process may be collected together into a page table. Typically 
the virtual pages in the address space are identified by a number: 0, 1, 
.... n. The page table then is an array of address translation registers; 
the ith page table entry is the address translation register for virtual 
page i. Because the address translation registers are grouped into a page 
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table, they are often also called page table words, since each is 
essentially a word in the page table. Hence we will use the term page 
table word to refer to these page address translation registers (and to 
distinguish them from address translation registers used for segments; see 
the following section) . The page table may be contained in special 
hardware registers, or reside in memory as any other data. Of course, the 
physical processor must know the physical address of the page table. If 
the page table is maintained in memory, a special register, the page table 
base register, indicates where. This translation mechanism for paging is 
illustrated in Figure 2.2. 

Besides containing the physical address of the page, the page table 
word often contains some additional items, such as whether the page has 
been referenced recently or modified. The reason for recording these 
facts is usually to provide information to various page control 
algorithms. More will be said about the function of such additional 
information below. 

2.1.4 Segments and the File system 

At this point a brief digression is in order. Although this thesis 
is concerned with paging systems and deals with pages as the basic 
component of a process's address space, it is necessary to also consider a 
higher level organization of the address space, namely segmentation. 
Segmentation has a profound influence on a paged system. 

Until now the address space of a process has been treated as strictly 
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linear, a one dimensional array of words. In Multics and other segmented 
systems this is not the case. The address space in a segmented system is 
two dimensional, containing multiple segments, each of which is itself a 
linear address space. Thus a virtual address in a segmented system 
consists of a segment number and a word number (offset) within the 
segment. Each segment is paged, so the offset within the segment is in 
two parts, as before: a page number and a word within the page. 

Instead of having a single page table, the address space of the 
process is now defined by a page table for each segment. There must be a 
page table base register for each page table; these will be called 
segment descriptor words and collected into a descriptor segment. The jth 
segment descriptor word contains the absolute address of- the- page table 
for segment j. The descriptor segment of a process completely defines the 
address space of the process. The physical processor executing the 
instructions of the process must know the location of the descriptor 
segment for that process. A register called the descriptor segment base 
register is used for this purpose. The translation of a virtual address 
in a segmented, paged memory Is shown in Figure 2.3. 

Segments may be shared, i.e. in the address apace of more than one 
process. In this case there will be a segment descriptor word for the 
shared segment in the descriptor segment of each process sharing the 
segment. These segment descriptor words will all point to the same page 
table. 

While the paging system bears the responsibility for maintaining the 
page table words, the job of assigning a page table to a segment will be 
assigned to a different module, the segment manager. Since the number of 
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segments in a process's address space is unlimited for most practical 
purposes, a page table cannot be given to every segment. Instead, the 
available page tables are multiplexed, just as page frames are multiplexed 
among a large number of virtual pages. That is, segmentation implies 
dynamic page table word allocation. Allocation of page tables to segments 
is a task very similar to allocating page frames to pages. This job is 
performed by the segment manager and will not be discussed further here. 
Activating a segment (corresponding roughly to opening a file in many 
systems) results in the segment being assigned a page table. 

The paging system can deal only with segments that are active, i.e. 
have page tables. Deactivated segments, those not assigned page tables, 
are manipulated by the segment manager and the file system. 

Thus the page tables, though indispensable to the paging system, are 
not completely implemented by the paging system. Rather the task is 
shared with the segment manager (or segment control, as it is often 
called) . And although segments per se are not really page control 
objects, page control is aware of their existence and has some knowledge 
of their implementation. As a consequence, there is interaction between 
segment control and page control. This interaction is undesirable as it 
complicates both segment control and page control, and we would like to 
minimize the interface between segment control and page control. This 
interface will be examined in detail at a later time. (1) 

Similarly, page control interacts with the file system and knows 



(1) Research in progress at the Computer Systems Research Division is 
attempting to eliminate from page control this knowledge of segment control 
and the implementation of segments. 
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about the file system's organization. Such knowledge complicates page 
control, and minimizing the influence of the file system on page control 
is highly desirable. By the file system we mean the operating system 
modules which manage the permanent storage of segments on secondary 
memory. The file system is responsible for knowing where a segment is 
stored in secondary memory so that the paging system may bring the 
segment's pages into primary memory when needed. Secondary storage page 
frames, or "records", are allocated to segments by the file system when 
the segment is created. Thus, the file system must remember the location 
of each page, and a "file map" analogous to a page table is kept for each 
segment to retain this information. The file map itself can be stored in 
the file system. 

The structure of the file system may vary widely; however we will not 
be concerned here with the specific organization. The file system may be 
hierarchical as in Multics or flat (one-level) . 

2.2 Page Control Functions 

Having examined the basic objects the paging system manipulates we 
turn to the operations that page control must perform on these objects. 
The most important job of page control is allocating memory, that is, 
assigning free page frames to hold pages. When all available memory has 
been allocated, memory deallocation must occur to enable reuse of page 
frames. Memory deallocation removes pages from page frames thereby 
freeing the page frame for further use. Note that in a multi-level memory 
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system a page may be allocated memory in one, several, or none of the 
levels. 

Hence the two major functions of page control are: 

1. Memory allocation 

2. Memory deallocation 

Two other minor function* that a paging aystfm may optionally provide 
are: :.;-.• <■■■■• . ■ 

1. Reconfiguration 

2. Wiring or Locking 

The following section* will consider all four of these in turn. 

2.2.1 Memory Allocation 

Memory allocation is the primary task, of the paging system. Recall 
that a processor may only reference pages which are allocated main memory 
page frames. A reference to a page not allocated a main memory page frame 
causes a page fault. Assuming a free list is kept, as mentioned in 
section 2.1.2, the steps involved in allocating memory and thereby 
resolving the page fault are the following: 

1. A reference is made to the page, whose page table word contains a 
special tag, causing a hardware fault which results in the invocation of 
the paging system's main memory allocator. 

2. A free page frame is obtained from the core free list. 

3. The identities of the page to be read in and the frame the page is 
read from are saved in the collection of information associated with the 
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main memory page frame. (This information is needed When deallocation 
occurs.) 

4. A read operation is performed to copy the contents of the page 
into the main memory page frame. 

5. The absolute address of the page frame is placed in the page's 
page table word, replacing the apeeial fa«lt tag. (Mdte the fault tag 
must remain until the read operation is completed.) 

Control may now be returned to the process that made the reference to 
the page. 

An important complication arises In a multiprocessing environment 
with sharing. Care must be taken so that idiile the sequence of steps 
described above is in progress, other processes sharing the page are 
prohibited from repeating the steps. That is, two processes may not 
allocate page frames for the same page simultaneously. This would lead to 
several possibly Inconsistent copies of the same page. There must be some 
inhibiting mechanism which prevents a process from beginning the 
allocation procedure for a page if some other process has already started 
the allocation algorithm for that page. 

There are many ways of implementing such a mechanism. One is to 
permit only a single page to be involved in the allocation procedure at 
any given moment. For example, the allocation code could employ a lock, 
which any process executing the allocation algorithm must set. Since 
there may be a considerable delay involved during the read operation, this 
scheme may result in an impractically inefficient paging system. A per 
page mechanism, rather than a global mechanism which inhibits all 
allocation, seems desirable. There is much more to be said on this topic; 
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the mechanism used to prevent multiple allocations for a single page is 
very influential in determining the efficiency of the overall page control 
design. A closer examination of this issue is postponed until Chapter 5. 

Memory allocation must be performed at each level in the memory 
system. Thus memory allocation must also occur for the paging device. 
The only difference from main memory allocation is the manner in which 
allocation is initiated. Main memory allocation takes place in response 
to a page fault; paging device memory allocation is done in response to 
an explicit request made by the main memory deallocation algorithm as 
explained in the next section. Otherwise, the steps in allocating paging 
device memory to a page are identical to those for allocating main memory: 

1. A request is made to the paging device allocator for a paging 
device page frame. 

2. A free paging device page frame is chosen from the paging device 

free list. 

3. The identity of the page is stored in the collection of 
information associated with the paging device page frame. 

4. The contents of the page are copied into the page frame. 

5. If the page has a main memory page frame allocated, the identity 
of the paging device page frame is saved in the information associated 
with the main memory page frame, and vice versa (see Figure 2.4). 
Otherwise, the identity of the paging device page frame is placed in the 
page's page table word so that when a fault occurs the location of the 
page on the paging device is known. 

As was the case with main memory allocation, once allocation of a 
paging device page frame to a page has begun, the system must insure some 
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other process does not duplicate the effort. The same mechanism used to 
prohibit multiple main memory allocations may be employed. 

Memory allocation at the final level of the memory system is the duty 
of the file system, since the file system bears the responsibility for 
permanent storage of segments. 

2.2.2 Memory Deallocation 

The second step in allocating main memory listed in the preceeding 
section is to obtain a free page frame from the core free list. This list 
can be maintained only by deallocating main memory; i.e. reversing the 
steps of the allocation algorithm and thereby freeing page frames. This 
operation is commonly termed "page replacement" in paged systems. Page 
replacement, or memory deallocation, is nothing more than removing pages 
from the page frames in which they reside. 

The steps taken in deallocating a main memory page frame from its 
page are summarized below: 

1. A used page frame is selected from the core used list. 

2. The page contained in the page frame (which can be determined by 
looking at the information associated with the page frame — see step 3 in 
the allocation procedure) is copied to some other page frame in the memory 
hierarchy (more on this shortly) . 

3. The physical address of the page frame stored in the page table is 
replaced by the address of the page frame copied to in step 2, and the 
fault tag is set. 
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4. The page frame la added to the main memory free list. (For 
security reasons, the contents of the page frame should be cleared to all 

zeroes.) 

Several comments are necessary to explain these stepfl further. 
First, nothing has been said about how the deallocflrefon algorithm is 
started. The allocation process might note when performing step 2 that 
the free list was empty and thus issue a call to the deallocation routine. 
This has the undesirable effect of delaying the allocation. The approach 
taken in the design presented in Chapter 3 is to maintain the free list at 
some minimum size; whenever the supply of free page frames is depleted 
below the system determined limit, deallocation begins until the free list 
is sufficiently replenished. There is, of course, a significant tradeoff 
involved here: time spent in allocating memory versus effective memory 
utilization. Page frames on the free list represent unused physical 
memory. It is possible to utilize memory completely by allowing the free 
list to become or remain empty. But then allocating memory is slowed due 
to the necessity of first deallocating some other page frame so that a 
page frame is free. Although a delay in allocating memory to any one 
process should not lower throughput in a multiprogrammed system, two costs 
are involved: a process that presumably already has pages in memory is 
prevented from running, and response time for any one process is 
lengthened. 

Second, nothing has been said about the criteria to be used in 
choosing from the used list the page frame that is to be replaced. The 
method for making this decision is commonly called the "page replacement 
algorithm" and usually involves usage characteristics of the page 
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contained in each page frame. For example, the First in, First out (FIFO> 
page replacement strategy chooses whichever page frame has been allocated 
to a page for the longest time. Note this implies it is possible to order 
the page frames by the length of time they have been allocated. One way 
to do this alluded to earlier is to maintain the used list as a linked 
list of page frames; the head of the list being the page frame in use for 
the longest period of time. Newly allocated page frames are added at the 
end of the list. We will not be concerned with the details of specific 
page replacement algorithms; the discussion of paging systems here is 
intended to be general enough to permit almost any page replacement 
algorithm. It is worth noting however that some algorithms require 
special information be kept on each page. For example, a "used" bit is 
often associated with each page. This bit is set by the hardware when a 
reference is made to the page. The replacement algorithm may examine the 
bit, and reset the bit, in deciding what page should be deallocated. The 
details of one such scheme are given by Corbato [Co69] . 

A third comment with respect to memory deallocation pertains to 
copying the contents of the page to some other page frame in the 
hierarchy. There are two points of interest: what other page frame to 
use, and when the copying is necessary. 

The question of where the page is to go when ejected from main memory 
is answered by looking in the data associated with the page frame. Recall 
that step 3 of the main memory allocation algorithm given above remembers 
the page frame a page is read from when allocated main memory. If the 
page was read from a paging device page frame, it may be written back to 
that same frame by an appropriate output routine. Otherwise, the page was 
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read from disk, and the paging device memory allocation mechanism is 
invoked (as discussed in the previous section) to obtain a paging device 
page frame to allocate to the page and serve as the destination of the 
page. Under certain circumstances, or if the paging device itself is not 
part of the current memory configuration, the page's contents may instead 
be returned to their permanent file system location. 

The copying is necessary only under two circumstances: 1. The page 
has not yet been written into the paging device page frame. 2. The page 
has been altered by a write operation, and hence the copy in main memory 
differs from the paging device copy. The first situation is readily 
recognized; to aid in detecting the second situation many -paged systems 
include special hardware which associates a ''modified" bit with each page. 
This is similar to the used bit mentioned in conjunction with page 
replacement, but the modified bit is set only whan a write reference is 
made to the page, e.g. a store instruction. This bit is examined by the 
deallocation algorithm; if it has been set then the page has been modified 
while in main memory and must be copied. 

Deallocation of paging device memory is analogous. The steps 
involved are as listed above for deallocating a page frame from its page. 
The comments apply equally well with only the following alterations: 

Utilization of memory on the paging device is less critical than with 
main memory. This is because there is assumed to be a much larger amount 
of memory on the paging device. Hence paging device page frames are a 
less critical resource; therefore it is feasible to maintain a larger 
number of page frames on the paging device free list than might be the 
case for main memory. 
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Used and modified flags may also be associated with each paging 
device page frame. The used flag may provide information to the paging 
device page replacement algorithm for determining which paging device page 
frame should next be deallocated. The modified flag determines when 
copying the contents of a page is necessary at deallocation time. 

2.2.3 Memory Reconfiguration 

The memory configuration is defined by the page frames available to 
page control for allocation. Memory reconfiguration consists of 
dynamically adding or removing page frames to the snpply available to page 
control. To add memory to the system dynamically, the page frames of the 
memory unit must be added to the pool of page frames controlled by the 
paging system. The inverse operation of removing memory is slightly more 
complex. The page frames of the device being removed must be freed before 
they may be removed from the memory configuration. 

Reconfiguration is not, strictly speaking* a page control function. 
It is included here because page control must cooperate in reconfiguring 
memory, and any paging system should be designed with an awareness of the 
problems of reconfiguration. Thus to assist in removing memory, a 
removing flag might be associated with each page frame. This flag is 
turned on by the reconfiguration algorithm. The allocation algorithm 
should be designed to ignore any page frames on the free list with the 
removing flag on. This prevents allocating to a page a page frame that 
will only have to be deallocated shortly. 
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Newly added memory may be treated simply as free page frames and 
added to the free list for future use. Schell fSc71J provides an 
extensive examination of dynamic reconfiguration. The desire to perform 
dynamic reconfiguration can complicate other page control functions 
severely, as the next section will demonstrate* 

2.2.4 Memory Wiring 

A useful function for the paging system to provide is that of 
"wiring" or "locking" memory. A "wired" page is simply a page that must 
always be allocated a page frame, thereby always remaining referenceabie 
by a physical processor. There is a second, more restricted type of 
wiring which will be called "absolute wiring" | an ^absolute wired" (or 
"abs wired") page not only must be allocated a page frame at all times, 
but the same page frame at all times. This means that the absolute 
physical address of the page will not be changed. 

Some system functions must be wired, at least in part, in order to 
operate properly. The pages of page control and page control's data bases 
are an excellent example of this. In order to avoid an infinite recursive 
loop of repeatedly taking page faults while handling, a page fault, at 
least a portion of page control's procedures and data must be wired. 

Absolute wiring is necessary only if absolute physical addresses are 
used by parts of the system. The most likely place for this to occur is 
in the input/output programs. Channel or i/o, programs may require 
absolute memory addresses; if this is the case pages used as buffers for 
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doing i/o to terminals, etc., once allocated a particular page frame, must 
remain there. The only alternative, to somehow keep track of all the 
instructions that use the absolute address and alter these instructions 
every time the page is allocated a different page frame, is generally 
impractical. 

Providing for wired pages is fairly straightforward. An additional 
flag may be associated with each page frame. When a page must be wired, 
it is allocated a page frame and the wired flag is turned on, indicating 
the page frame may not be deallocated. In searching for a page frame to 
replace, the replacement algorithm must skip, over any page frame whose 
wired flag is on. A page may be unwired at any time if it no longer must 
remain referenceable, by merely turning off the wired flag. 

Absolute wiring may be provided in a similar fashion. An extra 
complication arises if in setting up a buffer a contiguous area of memory 
greater than the size of a page is required. In such a case the paging 
system must contrive to allocate some number of page frames which have 
consecutive absolute physical addresses. It may not always be possible to 
guarantee this can be accomplished. 

The chief difficulties involved in both wiring and abs wiring virtual 
pages are due to two sources: sharing of virtual pages, and 
reconfiguration. Since the same virtual page may be in the address space 
of several processes, two or more processes may desire that a particluar 
page be wired. In such a case, a simple flag is inadequate; a counter of 
the number of processes wiring the page is needed instead. Where security 
is an issue, additional mechanisms are needed to insure pages may be 
unwired only by a process that previously wired them. 

37 



Reconfiguration poses a more difficult problem. Adding memory, of 
course, presents no difficulty. But consider what happens if an attempt 
is made to remove from the memory configuration page frames which have 
been wired or absolute wired. The reconfiguration must fail if an 
absolute wired page is encountered, for by definition its physical address 
cannot be changed. Simple wired pages can be handled, though not without 
some awkwardness. Remember a wired page must remain referenceable 
(allocated a page f ramie) at all times. Thus the page may be moved by 
allocating a new page frame, copying the contents of the page into that 
new page frame (meanwhile the page is still allocated the page frame being 
deconf igured) , and then replacing the address in the page table word of 
the page with the physical address of the new page frame. Additional 
complications occur if the virtual page is modified during the copy 
operation. This problem is discussed fully by tSehell £Sc71]. 

2 . 3 Summary 

The job of page control is to Implement a large virtual memory for 
processes by multiplexing the limited amount of physical memory. Page 
control deals with four objects: Pages are the basic unit of a process's 
address space. Page frames are the basic unit of allocatable physical 
memory. Page table words are used to map pages into page frames by 
translating virtual addresses referenced by processes into absolute 
physical addresses usable by hardware processors. Segments are logical 
units of information, either programs or data, consisting of one or more 
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virtual pages. Each segment has a page table containing all the page 
table words for the virtual pages of the segment. 

The chief functions of a paging system were seen to be memory 
allocation (assigning a page frame to hold the contents of a referenced 
page) and memory deallocation (removing the contents of a page from a page 
frame, freeing the page frame for allocation). Other functions related to 
page control discussed were memory reconfiguration (changing the pool of 
page frames available to page control) , and memory wiring (prohibiting the 
breaking of a page frame-page binding) . 
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CHAPTER 3 
Designs for Paging Systems 

Now that the underlying concepts of pigittg systems hav« been 
introduced and the functions required of such systems examined, we turn to 
the question of how to structure a paging system for a large computer 
utility. The Multics system will be Used as the basis for the general 
computer system model for which such a design is intended. 

Contemporary paging systems such as the Multics page control have not 
been implemented taking full advantage of the process concept even though 
the operating system itself implements and makes extensive use of 
processes. Rather each user process performs the functions of page 
control, using shared supervisor code and data. 

The first part of this chapter will present a method for classifying 
paging systems based on whether user or system processes implement the 
paging system. Multics will be used as an example of a paging system 
where the paging functions are performed by the user's own process. A 
simple change to convert the Multics design to one using a system process 
to perform the page control operations is then considered. Next a design 
splitting the paging functions among several processes is presented. This 
design was actually implemented and tested on the Multics system. 
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(Chapter 4 discusses the details of this implementation.) Two other 
similar designs appearing in the literature are contrasted to the proposed 
design. The advantages of these multi-process paging systems are 
demonstrated by comparisons with the current Multics page control. 

3.1 Paging System Structures 

We will divide paging systems into three broad categories depending 
upon the answer to the following question: Where, i.e. in what process, 
are the paging functions performed? The categories are: 

1. User-process paging systems, in which the page control functions 
described in Chapter 2 are performed by the users' processes. 

2. System-process paging systems, utilizing special system processes 
whose exclusive job is to carry out page control operations exclusively. 

3. Combination paging systems, where some page control operations are 
done in the users' processes, others by system processes. 

A further division of paging systems can be made based on how many 
processes implement the paging system. (Clearly this is not meaningful 
for user process paging systems, since all the processes in such a system 
implement the paging functions.) Thus we might consider system process 
paging systems or combination paging systems utilizing only a single 
system process as opposed to multiple processes. As will be seen, 
however, the advantages of multiple processes are so compelling that once 
the concept of using a system process to perform paging functions is 
accepted, multiple processes seem a natural and obvious extension. 
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In examining each of the different organisations for paging systems, 
we will be particularly J«£e?e#e*4 : J# ' £b# aolufcion #m deeign «•»*■ lor two 
crucial problem* inherent in a isalti-procees •oviroamen^ allowing sharing 
of pages among users. These two problems >--m» .4e*a Jmww -«ie«t^ntioa .-an4 .. 
page fault contention. 

By data base contention we mean the interference caused by two or 
more processes attempting to access a common data base simultaneously. 
Hence data base contention is a direct consequence of multi-processing. 
Data base contention is only a problem, of course, when the data base may 
be written as well as read. When a process may alter a data base, unless 
all alterations can be performed in a single, uninterruptible operation, 
there is the danger that another process may find the data base in an 
inconsistent or outdated state. This is not a problem unique to paging 
systems, arising here due to the fact a central accounting of all memory 
resources must be kept by page control. As a simple but important 
example, if two processes wish to obtain free page frames simultaneously, 
the paging system must insure the same page frame is not allocated to 
both. 

Thus we wish to know what mechanisms the paging system design offers 
to provide exclusive access to essential data bases. Ideally the 
mechanism should be easy to understand and use as well as guaranteeing 
data base integrity and prevention of system deadlock. Usually some form 
of semaphore or lock is involved. 

Page fault contention, or more simply page contention, is caused by 
the sharing of information among users in a multi-processing environment. 
When users share information, the pages containing that information are in 
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the address space of each user's process, and may be faulted on by any 
referencing process. If users were not allowed to share pages., e.g. If 
users executing the same program were always given their own copy of the 
program, page contention would be non-existent. By page contention, we 
mean the problem already mentioned in section 2.2.1. That is, two 
processes may not be allowed to allocate a page frame to the same page 
simultaneously, or multiple copies of the page in primary memory may 
result . 

In some sense page contention is really data base contention in a 
different guise, for after all a page may be considered a data base. We 
differentiate between page contention and data base contention because 
separate mechanisms are normally employed to resolve each. While 
conceivably careful data base design can minimize data base contention, 
page contention can not be avoided as long as the time required to read or 
write a page between memory levels is long relative to instruction speeds. 

The following sections present several designs for paging systems. 
Attention will be given to the techniques inherent to each for dealing 
with page contention and data base contention. 

3.2 Multics' User Process Page Control 

We begin our investigation of paging system designs with a typical, 

contemporary paging system, namely the Multics page control (as it existed 

in fall, 1975). The procedures of Multics page control execute in the 

users' processes, qualifying it as a user process paging system under the 
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definition of the previous section. 

3.2.1 The Current Multics Page Control 

A process taking a page fault in the Multics system begins all the 
required paging functions at the time of the fault. Thus allocation and 
deallocation of page frames in both levels of the memory must be done at 
page fault time. The complexity that this results in is well illustrated 
by Figure 3.1, which represents diagrammatically the Multics page control. 
The diagram is necessarily at a rather high level, omitting much detail. 
The boxes represent program modules (procedures) carrying out specified 
functions; the solid arrows depict procedure calls and the dashed arrows 
indicate interprocess messages. The following paragraphs describe the 
sequence of events represented by Figure 3.1 happening after a page fault. 

When the page fault code is invoked, the first thing done is to run 
the paging device page removal algorithm, as depicted by the call to the 
routine labeled "get free pd record" in Figure 3.1. This procedure checks 
to see if there are ten free paging device page frames. If there are less 
than ten, enough paging device page frames are selefcfcedi oft* -at a time, to 
increase the number to ten, and the necessary i/o to remove their pages 
from the paging device is begun. 

At this point two complications arise. The first is due r to hardware 
limitations of the Multics system. It is not possible to perform read or 
write operations directly between the paging device and the disks in 
Multics, only between main memory and the paging device or between main 
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memory and the disks. Thus, the operation of writing a page from the 
paging device to the disks must be done in two steps: first a read 
operation, reading the page from the paging device to main memory; second, 
a write operation, writing the page from main memory to disk. This two 
step operation, a read followed by a write, is called a "read write 
sequence", or "rws". Note that performing a read write sequence requires 
a free main memory page frame. This is indicated in the diagram by the 
call made by the module "start rws" to the "find core" routine. 

The second complication results from the relatively long time 
required to perform a read or write operation on a page. To require that 
the faulting process wait until the i/o operations it may start as part of 
read write sequences are completed would intolerably delay the faulting 
process, causing poor response. Thus the i/o necessary to evict pages 
from the paging device is not waited on, but only started. When the 
completion of this i/o is signalled via a hardware interrupt, whatever 
process is currently executing must deal with the interrupt. Thus the 
task of deallocating paging device page frames, though begun by the 
process taking the page fault, is finished by whichever process happens to 
be running at the completion of the disk write operation. 

Returning to the discussion of Figure 3.1, we are now ready to 
resolve the page fault by calling the procedure named "read page", which 
must first allocate main memory space. This is done by a call to "find 
core" which is the main memory page replacement algorithm. When a free 
page frame has been created by evicting a page, it is returned to read 
page, which then may start a read operation to copy the contents of the 
faulted-on page into main memory. The faulting process must then wait 
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until the read is completed, as indicated by the call to the procedure 
"page wait". The completion is signalled via a hardware interrupt, which 
is converted to a software aotify. 

Multics uses a single semaphore, called the global page table lock, 
to solve the data base contention problem. This lock must be set by a 
process before it may begin processing a page fault. The lock is released 
just before the process blocks itself by calling "page wait". In between 
these times, another process attempting to resolve a, page fault must wait 
until the lock is released. 

Waiting on the lock is done by repeatedly trying to set the lock 
until one succeeds in doing so. This "busy" waiting has two major 
implications: 1. A. process may not block itself, giving up the processor, 
while it has the page table lock set. If this were done, all page control 
functions would be prevented until the process were awakened and run 
again. 2. For efficiency reasons, the time spent with the lock set should 
be minimized, as this in turn minimizes the interference among processes 
due to the lock which results in wasted processor time. 

Measurements show that when running the standard Multics system in a 
configuration with two processors, under a moderate to heavy load the 
processor time spent looping while waiting to lock the global page table 
lock can amount to 10% of the total system processor time. In certain 
extreme conditions this overhead can go as high as 20%. This effect would 
be even worse in a system with three or more processors. Hence the global 
locking strategy can have a severe impact on system performance. (1) 

(1) A recent experiment has shown that abandoning the processor rather than 
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The global page table lock is not used to protect against page 
contention. To do so would prevent any process f roar resolving a page 
fault until all read and write operations caused by a previous page fault 
had completed (including read write sequences) . Instead, a per page lock 
(implemented as a bit in the page table Word of each page) is used. This 
per page lock is set whenever i/o is begun on a page (ilhich can only 
happen with the global page table lock set) and remains set until the i/o 
completes. Thus a process faulting on a locked page, even though it gains 
control of the global page table lock, cannot start i/o to bring in the 
page (or to throw it out). The process mast wait until the lock is 
released. Hence the per page lock protects the page while in transition. 

3.2.2 A System Process Page Control Based on Multics 

To introduce how a paging system implemented as a system process 
might work and to see some of the potential advantages of such a design, 
consider the following simple yet radical change to the design just 
described: When a page fault occurs, instead of having the user process 
execute the programs to resolve the fault, simply send a message to a 
system page control process, and wait for a return message saying the 
desired page has been bound to a page frame in main memory. Nothing else 
is changed; the algorithms described previously and illustrated by Figure 
3.1 have merely been made a separate process. Essentially what has 



looping on the lock will increase the performance of a three processor 
system. This change may be incorporated into the system. 
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happened is that a page fault has been transformed from a call to the page 
fault procedure to an interprocess message to the page control process. 

There are disadvantages to this design, mainly in terms of 
efficiency. The time required; to resolve a page fault is increased by the 
length of time required to send the message to the page control process 
and to schedule the page control process. 

What do we gain? First, the page control process has its own address 
space and execution point. A separate address space enables removal of 
all the paging algorithms and data bases from the user's address space. 
The execution point, as we shall see, allows parallel execution of the 
page control process . 

A second benefit is guaranteed service. ,, Since the messages to the 
page control process (i.e. the page faults) can be ordered, we can serve 
the page faults in the order they occur. There is nothing in Multics 
currently to prevent an unlucky process from always being locked out of 
the page fault handling code by competing processes who always manage to 
lock the global page table lock first. (That this actually ever happens 
is a very remote possibility, but important if guaranteed service is a 
system goal.) 

A third benefit is the elimination of the global page table lock. 
Since only a single process, the page control process, may access the 
paging system data bases, data base contention is impossible. This 
benefit seems illusory because the single process has replaced the global 
lock, and the overall effect is the same — only one page fault may be 
processed at a time; in fact only one page control function may be 
performed at a time, since there is only one process (and hence one 
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execution point) to perform then. However, replacing a lock with a single 
process is not only concept eally cleaner bat ata«r easier to understand and 
show correct. 

The important thing here is the fact that the process has an 
independent execution point as well as a separate address space. Once we 
realize this fact, the question arises as to why net change the algorithm 
of Figure 3.1 to take advantage of this execution point? I&y continue to 
deallocate page frames only when resolving a page fault? Since the page 
control process knows page frames will be needed ■■» why hot have him execute 
the page replacement algorithm between page faults* when fee would 
otherwise be idle? 

This concept of allowing independent parallel processing by a system 
process performing page control functions^ leads us directly to the 
multi-process combination paging systems discussed in the next sections. 

3.3 Multi-process Combination Paging Systems 

Expanding on the possibilities suggested by the single system process 
design presented in the preceeding section, three multi-process 
combination paging systems are examined here. In each of these, the 
necessary allocation of main memory page frames to pages is performed by 
the faulting process. Deallocation, however, iS done by the special 
system processes. Thus these paging systems classify as combination 
paging systems as defined in section 3.1. Additionally, each design uses 
multiple processes to implement the system performed paging functions, 
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hence the term multi-process combination paging systems. The number and 
organization of the system paging processes are what distinguish the three 
designs. The first is due to the author and has been implemented on 
Multics (see Chapter 4); the other two designs have appeared in the 
literature. 

3.3.1 A Two Process Paging System 

In Chapter 2 it was noted that the work of the paging system can be 
described largely as allocating and deallocating page frames to and from 
pages. Allocating a page frame to a page is a relatively simple task that 
a process can do for itself, since there is no need for parallelism — the 
process cannot continue until the page fault is resolved. In demand paged 
systems, allocation is performed only upon actual reference to a page, 
because it is impossible in general to predict which pages in its address 
space a process may reference. 

Deallocating page frames (and thereby creating free page frames) is a 
more complex task involving decision makings namely choosing the page that 
is to be replaced. Deallocation, unlike allocation, may be done at any 
time. 

In particular, page frames may be freed in advance, maintaining a 
pool of free page frames from which page frames are selected as needed. 
Replenishing the supply of free page frames may be done whenever 
convenient. The job of deallocating page frames may be assigned to a 
system process, distinct from user processes. Note this allows us to take 
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advantage of the parallelism offered by a process. This completely 
removes the page replacement function from the user process. There are 
several immediately obvious advantages t<i such a Strategy: 1. Page faults 
may be resolved faster, since deallocation is no longer done at page fault 
time. 2. The page fault algorithm is simpler. 3. The procedures and data 
involved in doing the deallocation may be removed from the address space 
of the user process. These and other benefits of such a decision will be 
discussed fully later. 

Since the memory model assumed here (Figure 2. 1) incorporates two 
levels of memory managed by the paging system, two system, processes will 
be used in the multirprocess page control suggested here. ..jOae will be 
assigned the task of deallocating page frames for each level in the 
memory. The three parts of the resulting design, (feand^ng page faults in 
the faulting process being the third) are discussed in turn. 

The Core Manager Process 

The special system process assigned^ the task of deallocating main 
memory page frames will be called the core manager process. The algorithm 
followed by the core manager is depicted in Figure 3,2, As-long as the 
number of free page frames in the pool available for allocation is less 
than some system determined value, the core manager keeps deallocating 
page frames. First, the page replacement algorithm ^invoked to decide 
which page frame is to be deallocated. No|e this is strictly a policy 
decision. Once a page frame has been selected, it can be freed by writing 
the page out of main memory and changing the page %&hU word for the page 
appropriately. When the write operation is completed, the newly freed 
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Algorithm of the core manager process 
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page frame may be made available for allocation Co some process requesting 
a page frame. This sequence of steps may be repeated until the supply of 
free page frames reaches some system determined value* at which time the 
core manager process blocks itself, notice that processes may be 
requesting free page frames from the free |NB0i even while The core manager 
is executing. 

There must be some means of starting up the core manager process. 
One way to do this is to simply wake up the cote manager periodically. An 
alternative strategy which adjusts to varying demands for free page frames 
is to wake up the core manager process whenever the pool of free page 
frames becomes low. This requires Interprocess communication, for the 
process which notices the number of free., mage, frames is down must wake up 
the core manager. That is, the routine which allocates free page frames 
must follow the algorithm shown in Figure 3.3. If there is at least one 
free page frame, it is immediately allocated to the caller. If the 
remaining supply of free page frames is under a system defined minimum, a 
wakeup is sent to the core manager process. However, if there are no page 
frames in the free pool, the allocation code must do one of two things: 
1. Report failure to its caller, who must try again later, or 2. Block the 
calling process until the core manager process signals that the supply has 
been increased. Of course, in either case the core manager must be 
awakened to start replenishing the free pool. The latter approach is 
chosen here because it results in an allocation strategy which always 
succeeds in the eyes of the caller, i.e. always returns a free page frame. 
This simplifies the code in the calling procedure. Indeed the caller will 
never know what happened, except perhaps that it took longer for the 

54 



START 



/ 



"\ 



/ Is the pool \ I 
/ of free page frames \ YES ^ 1 Send wakeup to 



\ 



empty ? 



/ 



\ 



/ 



/ 



NO 



I 
± 



Chose a page frame 
from the free pool 



core manager 

l__ 

I 
I 

I I 

| Go blocked | 

I , I 

i 

I 

I 

| Receive wakeup 



/" 



/ Is the number \ I 

/ of free page frames \ YES v | Send wakeup to 
\ less than the / I core manager 

\ minimum ? / I 



/ 



I 



NO 



Return selected 
page frame to the 
caller 



t 
END 



Figure 3.3 

Algorithm of the page frame allocator procedure 

55 



allocating procedure to return the requested page frame. (A complication 
may arise here with the use of locks; see Section 5.1) 

Two additional points remain to be made. Flflt, adopting the just 
described strategy means the algorithm of Figure 3.2 18 incomplete. An 
additional step must be Included to send wakeup signals to spy processes 
.that have gone blocked because title page frame pool was empty. Second, 
since any number of processes may be requesting free page Trames 
simultaneously, some technique is necessary to insure a page frame is not 
allocated to two requestors. For example, a lock on the free pool is 
sufficient. The fact that several processes may be competing for any page 
frames in the free pool also explains the loop in the algorithm of Figure 
3.3. When a process is awakened by the core manager, there is no 
guarantee that there are still page frames in the free pool, since other 
processes may have grabbed them all. Therefore, after going blocked to 
await replenishment of the free page yf ram* pool, |fee algorithm must be 
repeated from the beginning. 

The Paging Device Manager Process 

The paging device manager process is the second of the two system 
processes used to manage memory in our multi-process design. Chapter 2 
noted the similarity of the paging device memory to the main memory, and 
that allocating and deallocating page frames must be done For each level 
in the multi- level memory hierarchy assumed in our model. In fact, the 
allocation and deallocation of paging device page frames is so similar to 
the allocation and deallocation of main memory page frames that the 
algorithms to be used by the paging device, manager process and the core 
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manager process are almost identical. Figure 3.2 describes the paging 
device manager's algorithm as well as the core manager's algorithm. The 
details need not be the same, e.g. no doubt a different policy may be in 
force for deciding which paging device page frames are to be freed, but 
the general form and structure are the same. 

In a like manner, Figure 3.3 also describes the algorithm used by the 
paging device page frame allocating procedure, except of course the wakeup 
signals would be directed to the paging device manager process rather than 
the core manager process. The parameter used to trigger the signal to the 
paging device manager, the number of free paging device page frames, may 
also be different. 

Handling Page Faults 

Now that we have added two system processes to do the deallocating of 
page frames at each level of the multi-level memory system, we turn to the 
allocation operation. Figure 3.4 shows the steps necessary to resolve a 
page fault, i.e. allocate a main memory page frame to a page in a system 
using two system processes to perform deallocation. The first box invokes 
the page frame allocation procedure, previously presented in Figure 3.3. 
This may result in the faulting process blocking itself if no free page 
frames are available. In the usual case however, a free page frame will 
be available and will be returned. The page may then be bound to the 
allocated page frame, and the necessary read operation begun to read the 
contents of the page into the memory locations of the page frame. 

The remaining procedure needed to fill in the picture completely is 
the procedure which performs the allocating of pages to paging device page 
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Binding a page to a page frame 



frames. This occurs during the freeing of main memory page frames. One 
of the steps in the algorithm of Figure 3.2 is to free the main memory 
page frame from its page. This deallocation results in the contents of 
the page being copied to some other page frame in the memory hierarchy. 
Thus the replacement really expands into the three steps already depicted 
in Figure 3.4 for allocating a page to a page f ram* , That is, a paging 
device page frame is allocated and the page is written to the memory 
locations of the paging device page frame. 

The interrelationship of the core manager' process, the paging device 
manager process, and a process trying to resolve a page fault is 
illustrated by Figure 3.5. The boxes represent program modules which 
perform the function indicated by their label. The solid arrows depict 
calls made by one module to another, and, the broken arrows represent 
interprocess signals. For example, the main memory allocation procedure 
will send a wakeup signal to the core manager process when the number of 
core page frames becomes too low, as indicated by the broken arrow from 
the box labeled "allocate core" to the box titled "core manager". 
Similarly, if in removing pages from main memory the core manager 
discovers there is an insufficient supply of free paging device page 
frames, a wakeup signal is sent to the paging device manager process, 
represented by the arrow from "allocate pd record" to the "paging device 
manager" . 

This design, as implemented on the Multics system as described in 
Chapter 4, incorporates the same features as the Multics page control for 
preventing data base contention and page fault contention. That is, a 
global page table lock prevents the core and paging device managers from 
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executing simultaneously, or one of the system processes frpm running 
while a user process was resolving a page fault. Per page locks, are, also 
used to solve the page fault contention, problem. However , one of the 
benefits seen from this design, as discussed in section 3. 4.4. is the 
potential for splitting the global page enable loc^.,. This question will be 
considered fully in Chapter 5. 

3.3.2 Hoare's Structured Paging System 

Ho are has proposed [Ho73] a multi-process paging system intended for 
a general computer system. The model Hoar e uses for a general computer 
system is similar to the model assumed here; the major difference in the 
models used is due to the one level memory incorporated into Hoare's 
model. That is, Hoare assumes a memory system consisting of a main memory 
and a drum as a backing store, but does not include a second level of 
memory such as the disks assumed here. 

Hoare uses monitors [HoMT to describe his system. Monitors are 
procedures with built-in synchronization primitives. A monitor defines a 
group of procedures only one of which may be in execution at any time, 
thus ensuring mutual exclusion among processes executing the procedures 
comprising the monitor. Hence monitors are a high level locking device. 
In Hoare's system a monitor is assigned to each page; this monitor 
includes procedures to access the page, bring it into main memory on 
demand, etc. Thus a process faulting on a page invokes a procedure in the 
monitor for that page to bring the page into core. The built-in 
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synchronization ability 6f the monitor ensure* that an&ther process does 
not simultaneously attempt to bring the «aa» p«|e into 1 core. 

Memory deallocation is done by system processes in Hoare's design. 
Rather than using a single proceed tot* ' eacW le**i Of memory, Hoare assigns 
the page replacement task to a Separatif proces* for each page. When a 
page is brought into main memory in Hoare's system, S process is created 
and started up which periodically tries to throw the page out of main 
memory if it has not been referenced recently. 

Hoare's monitors permit a high level solution to both the page fault 
contention problem and the data base contention problem. The monitors 
assigned to each page are essentially per page locks, solving the page 
fault contention problem. Similarly, putting the other paging system 
functions inside a monitor also guarantees exclusive access to paging 
system data bases. 

While Hoare' 8 monitors allow him to describe his system in a rather 
elegant fashion, the system suffer two serious drawbacks in practice. The 
first is actually implementing the synchronization implicit in the use of 
monitors. There are serious efficiency issues unanswered here because a 
combination of hardware, or "busy" waiting, and software waiting is 
required. 

The second, perhaps more serious deficiency in Hoare's proposal is 
the number of processes involved, one for every page in main memory. 
There is always overhead involved in implementing processes, both in 
keeping track of the state of the process, and, scheduling the process at 
the appropriate time. Host systems are not capable of supporting the 
large number of processes required, and most schedulers are not designed 
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to give the fast response that would be necessary to make Hoare's scheme 
efficient enough for practical purposes., For these same reasons Hoare's 
system would expand poorly to a system with more levels of memory* 
Adopting the sane strategy of one removal process per page would worsen 
the problems of implementing and scheduling the necessary number of 
processes. 

There is an orthogonal viewpoint of paging systems from that taken in 
this thesis, a view which Hoare's description adepts: in part. We have 
pictured pages as objects manipulated by system and user processes.. 
Instead, each virtual page may be though* of as ^process, a process that 
performs all desired actions on the page, moving it in and out of memory, 
wiring it, etc. (Not just removing it from memory as do Hoare's 
processes.) 

This concept of a page as a process has also been used to explain 
Multics page control. (1) As already pointed out above, it is 
prohibitively expensive to actually implement a process for each page, 
however pages can be thought of as being implemented as very simple 
processes with page control acting as an interpreter for these processes. 
The per page information (e.g. flags, locks) define the current state of 
each page process; the various actions of the page processes (such as 
wiring themselves, bringing themselves into memory) are done 
interpretively by the page control code. 

A more formal characterization of this view is to define each virtual 
page to be a finite state machine. The state of each such finite state 



(1) This description of Multics page control is originally due to Bernard 
Greenberg of Honeywell Information Systems. 
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machine (page) Is defined by the values of all Che per page information 
contained in and associated with the page's page table word -»- the used 
and modified bits, the wired flag, etc. Each transition of the finite 
state machine corresponds t© an action performed .em the page* and is 
implemented as some page control procedure. 

For example, two states of a page are the "in core" state (i.e. 
allocated a core page frame as indicated by the page frame address in the 
page table word) and the "out of ■ sere" e***e ■ (*«* allocated a core page 
frame as indicated by the faslt tag in the page table word) . the 
transition from the "in core" state to «4ie «oat of core" state is 
implemented by the code of the page replacement algorithm. Conversely the 
transition from "out of core" to "In cote" is performed by the allocation 
code. The inputs which cause the various state transitions are requests 
from processes, e.g. a user process wishing to reference a particular page 
may cause that page to move from the "out of core" state to the "in core" 
state (and as a side effect cause seme other page to make the transition 
from in core to out of core). 

Page control, then, emulates these finite state machines by driving 
the pages through the various states in response to the demands of user 
processes. Hoare's monitors, which perform all the allowable actions 
(transitions) on pages, make explicit the concept of a finite state 
machine. The procedures of the monitor directly implement the state 
changes of the page. 
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3.3.3 Saxena and Bredt's Hierarchical Paging System 

As part of a structured design of an operating system Saxena and 
Bredt [Sa75] include a description of a paging system. Their hierarchical 
operating system consists of four levels* numbered one to four, each level 
built on top of the lower numbered levels (level 0* is the hardware) . The 
four levels are; 1. A simple scheduler for running- and: synchronising a 
fixed number of system processes. 2. A simple mamefy manager which 
implements a virtual memory for these system. processes. $• A scheduler 
for implementing and synchronising » lafge numbjef of concurrent processes 
using virtual memory. 4. A memory manager for implementation of the 
virtual memory. Essentially the simple scheduler and simple memory 
manager implement system processes which provide complete process 
multiplexing and virtual memory to a large number of user processes. 
Monitors are also used to describe this system, and, to solve the data base 
and page fault contention problems.; 

The chief distinction of this system from the one presented la 
section 3.1.1 is in the extra scheduler and memory manager. Like Hoare' s 
system, only a single level memory is considered, .However,, unlike Hoare' s 
system, only a small fixed number of processes is necessary to Implement 
the paging system, because a process Is not assigned to each page. Saxena 
and Bredt specify a page replacement process which, like the core manager 
process of Figure 3.2, can operate on any page, rather than creating a 
separate replacement process for each page as Hoare does* And instead of 
assigning a separate monitor to each page, a single monitor performs the 
memory allocation function for all pages. Thus, only one page fault may 
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and Bredt removes the mutual dependency of the top two levels. 

In practice, the advantages of allowing the memory manager and 
scheduler to take page faults may never be realized. Supposedly, paging 
the memory manager and scheduler will free physical memory for user pages. 
Yet the pages of these two modules are normally so heavily used that they 
will always be in main memory anyway. There is also an efficiency issue 
in allowing the scheduler and memory manager to take page faults, for 
overhead is increased and response time adversely affected. This is a 
major reason why many systems make these these two modules permanently 
resident. 

Hence transparency of structure rather than efficiency is the real 
issue. Careful design may eliminate the need for two levels of both the 
memory manager and the scheduler. Such a design has been proposed for 
Multics using a two level scheduler and a single memory manager. A simple 
scheduler Implemented below the virtual memory would allow use of 
processes by the virtual memory manager, while a more complex scheduler 
implemented above the virtual memory would implement user processes and be 
able to take page faults. By careful design, the low level scheduler 
does not need to use the virtual memory. 

One of the key questions here is the larger issue of the proper 
structure for an operating system. We have concentrated on the design of 
just one part of an operating system, the paging system. The previous 
discussion points out the need for considering the design of the paging 
system in the context of the overall system structure. The general 
problem of structuring operating systems has been treated by many 
researchers [L172] [Di68b] [Ha70] , and is beyond the scope of this thesis. 
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3.3.4 System Versus Combination Paging Systems 

Little has been said to this point about system-process paging 
systems, with the exception of the discussion in section 3.2.2 considering 
Multics as a system process paging system with a single page control 
process. To remedy this deficiency, we discuss in this section how the 
two process combination paging system presented in section 3.3.1 (and 
implemented on Multics as discussed in Chapter 4) could become a system 
process paging system using three system processes to Implement the page 
control functions. 

The combination paging system of section 3.3.1 can be made into a 
pure system process paging system by removing page fault handling (memory 
allocation) from the user processes. Instead, a third system process will 
be assigned the page fault handling Job. Thus a user process taking a 
page fault sends an interprocess message to this fault handling process 
which performs the steps of Figure 3.4. When the faulted on page has been 
read into the allocated page frame, a message is sent back to the faulting 
process, starting it up again. 

The essential difference between such a three process system page 
control and the two process combination page control is that memory 
allocation (page fault resolution) is occurring in a single system process 
instead of in many user processes. This has major implications in two 
areas: security and efficiency. 
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The system process design seemingly offers improved system security. 
The memory allocation code, and the data bases referenced by this code are 
removed from the address space of the user's process. This not only makes 
the user's address space smaller and more compact, -but makes it impossible 
for the user to intentionally or inadvertently damage this code and data 
and thereby affect other users. This separation is important in systems 
with no protection mechanisms, but since most computer systems do offer 
some means of protection (e.g. supervisor mode, write protected memory, or 
rings as in Multics) there is likely to be little if any extra protection 
from the user afforded in practice by handling page faults in a separate 
process. 

More significant is the effect of the page fault handling process on 
system efficiency. First, there is the extra overhead required by the 
interprocess messages needed to report the page fault to the system 
process, and to signal completion of the fault to the faulting process. 
Even if the message sending overhead can be minimized, there is the 
additional expense of scheduling, that is saving the state of the faulting 
process and starting the page fault process, and vice versa when the fault 
is completed. 

There is yet another consideration with respect to efficiency, 
important in multi-processor configurations. Namely, only one page fault 
may be processed at any time, because there is a single page fault 
handling process to resolve page faults. While this could conceivably be 
remedied by explicitly adding a page fault handling process for each 
physical processor, note that the combination paging system does this 
implicitly by having the user process resolve the page fault. Since as 
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many user proceanes may he executii^ simultaneously e^ th*re ere physical 
processors, the combination paging system eu&onatieall.y expands or 
contract* the number of praoaseea handling page faatfea-ufct aay time. 

Of course, the preceding argument la ir relevant if • global lock is 
employed to prevent data base contention, becauae then only a single 
process may be resolving a page fault in any ceee. Jktwever, Chapter 5 
will describe how using aya*ea proceaaaa aaablaa splitting the global lack 
into several locks. Hence thai tangible d&ffeeaacee between the two 
designs are likely to be slight, and tfee da^iaioa«s to which is beat for 
a given system %d.ll depend heavily on aueh factors as the locking strategy 
and how efficient the implementation of processes is . 

3.4 Advantages of Multi-Process Paging Systems 

Having examined numerous multi-process paging systems, the question 
arises as to the superiority of such designs over a conventional design 
such as the Multics page control described in section 3.2.1. There are 
four areas where the multi-process designs offer decided advantages: 
simplicity, modularity, security, and expandability. 

While these advantages accrue to all multi-process designs appearing 
in section 3.3, the following discussion pertains directly to the two 
process design presented in section 3.3.1 whose implementation is 
discussed in the next chapter. 
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3.4.1 Simplicity 

The multi-process design is clearer and easier to understand due to 
the separation of the allocation and deallocation tasks into separate 
processes. Both the core manager process and the paging device manager 
are simple, sequential algorithms which can be understood without 
reference to the other parts of the paging system. In contrast, the 
corresponding algorithms in Multics are intertwined in a complex manner. 
This complexity is largely due to the fact that the three tasks split into 
separate processes by the multi-process design are lumped into a single 
process, that which takes the page fault. This process becomes something 
of a three ring circus, trying to do everything at once — free space on 
the paging device, free space in main memory, resolve the page fault. In 
order to do so, an ordering must be imposed on these tasks* since a single 
process must do things sequentially. The fundamental problem here is 
caused by trying to place a sequential order on inherently parallel tasks. 
There is no satisfactory way, to avoid these difficulties except to realize 
the parallel nature of these tasks and allow them to be done in parallel. 

Separate processes also greatly simplify the treatment of i/o 
interrupts. The chief source of difficulty with input and output 
operations is the relatively long time they require relative to 
instruction execution times. We have already seen that in the Multics 
page control the process which starts a read write sequence does not wait 
for the disk write to complete, since to do so would delay page fault 
resolution. Therefore the completion of the read write sequence must be 
noticed by whatever process is around at the time. This of course 
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complicates things, as all processes must be ready to pick up* whare 
someone else left off. 

On the other hand, the paging device" malfager' process can wait for a 
read write sequence to complete, Since his joo £s md'atly nothing out 
performing read write sequences. Similarly, the core manager process, 
once a write has been started to ciop^ t pl|l to the' 'ftfging 'device or to 
disk, can simply wait until the wrltl is finished.' 1 " 

Essentially we are argu'ln"! M fivor oi? I sep^Ifale process for 
performing i/o (e.g. the paging device mana*get process doing the i/o for 
read write sequences) as oppoie^f t# a tll^itional interrupt handler, which 
spreads the i/o among whatever procesaff ifl executing. There are two 
chief advantages of the process approach over the interrupt handler. 

The first of these is the clarity of structttrlfe of the process 
approach. The sequential nature of M realS' write °ieqtiihce is obvious from 
the paging device manager's algorithm: sttrf a reiatf, wait for the read to 
complete, start the write, wait lor the" wf lie to complete. In contrast, 
the same algorithm implemented in an intlrrupt handler obscures the fact 
that a disk write always follow! a bulk Store read in performing read 
write sequences. Some process starts the readf; when the read* completes 
the interrupt handler receives control. Interrupt handlers are invariably 
written as dispatchers — the source of the interrupt is determined and 
appropriate routines performed to do whatever is necessary. Thus, after 
determining the read portion of a read write sequence has completed, the 
interrupt handler starts the write. The intdrrupt handler regains control 
later, on completion of the write, and finishes up. 

In other word 8, the process which starts the I/o is best equipped to 
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know what actions should be taken when the i/o completes. Having a 
process perform i/o allows us to take advantage of this fact, while using 
an interrupt handler places all knowledge of what action to take in the 
interrupt handler code, forcing the interrupt handier to sort out all the 
various possibilities. 

The second major advantage of the process approach is that it 
permits formalized interprocess communication mechanisms to be used in 
implementing the i/o. Block and notify primitives may be used by the i/o 
process, which blocks after starting i/o. The process receiving the 
interrupt merely turns it into an interprocess notification (the "notify" 
of Figure 3.5). The awakened i/o process then continues with whatever 
steps are appropriate upon completion of the i/o. In addition, the i/o 
process can, if necessary, wait on a lock, where an interrupt handler 
cannot (since the interrupt handler may have interrupted the process that 

locked the lock) . 

The end result is a simplification of the treatment of interrupts; 
only the lowest level of the system, directly above the hardware, need be 
aware of and deal with interrupts. All the processes performing i/o 
implement the i/o in terms of waiting on events using the standard 
interprocess communication tools. 

The philosophy of using separate processes for i/o in place of 
interrupt handlers is given in more detail by Clark [C174] . 

Dedicating a process to manage the paging device allows another 
simplification in performing read write sequences. A read write sequence 
requires a main memory page frame. If any process may start a read write 
sequence if may be difficult to obtain the necessary page frame without 
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adding complex module interconnections. Since the paging device manager 
repeatedly performs read write sequences, a main memory page frame may be 
assigned to the paging device manager permanently for use as a buffer, 
avoiding the problem of dynamic allocation. Ibis Solution is possible in 
the Multics page control, but much wore difficult for two reasons: 1) 
Since any process may start a read write fcqimm**'- : &*y 9*& frame used as 
a buffer must be protected against multiple simultaneous use. (Mote in 
the multi process scheme the paging device manager process acts as a lock 
on the frame used as a buffer*) 2) A single process «»* start several 
read write sequences at the same time. (This is how the Multics page 
control achieves parallelism.) This iie^d too^ic* sttveral page frames be 
available as main memory buffers. 

The factors Just discussed result in a simpler, easier to understand 
paging system. This has important ramifications in^mnay areas> Since the 
code is simpler and more understandable, it is easier to modify and 
maintain. This is valuable not only in testing and debugging the code, 
but in being able to change the algorithm* at a latear dmte with confidence 
that the system will continue to work, and to be able to predict any 
changes in system performance. For the same reason, the code would be 
easier to certify, or to use in proving a given property about the system. 

3.4.2 Modularity 

The separation of the main memory page replacement function and the 
paging device page replacement function into separate processes makes 
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possible a much cleaner modularization of page control. This is apparent 
by comparing Figures 3.1 and 3.5. For example, it is clear from Figure 
3.5 that the main memory replacement algorithm (represented by the box 
labeled "get core") is part of the core manager process, and is invoked 
only by the core manager. This is not the case with the Multics design of 
Figure 3.1, where when performing paging device page replacement we can 
suddenly find ourselves executing the main memory page replacement 
algorithm. 

Improved modularity reduces the possible paths through the code, i.e. 
lessens the interconnections between modules, and simplifies the 
interfaces between the resulting program modules. Many of the benefits of 
better modularity match those discussed in conjunction with 
simplification. However, though improved modularity and greater 
simplicity complement each other they are not the same thing. Modularity 
can be bought at the expense of complicating the individual modules; 
conversely a system often can be made to seem simpler by increasing the 
number of modules. 

The most important advantage of the modularity of the multi-process 
design is when considering modifications of the design to other systems. 
For example, consider a computer system with paging but without the 
multi-level memory assumed in Figure 2.1, i.e. consisting only of main 
memory and disks, without a paging device. To use the two process design 
presented in section 3.3.1 would require elimination of the paging device 
manager and a slight change to the core manager so that pages evicted from 
main memory were always written to disk. Similarly, if another level of 
memory were added, another module analogous to the paging device manager 
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could be added in a relatively straightforward nanncr to manage the 
additional memory. That is, the design expends and contracts easily and 
raodularly to fit any multi-level memory ays teen Either of these two 
modifications would necessitate extensive, major alterations to the page 
control of Figure 3.1, due largely to -ft* laefcof functional modularity. 

3.4.3 Security 

The multi-process design presented here offers significant security 
advantages over a traditional scheme. By security we mean the prevention 
of unauthorized release or modification of information (either procedures 
or data) . Dividing page control into separate processes increases 
security between parts of the system, and allows separation of policy from 
mechanism within page control. 

Protection of the uaer from the system, or the system from the user, 
is not directly enhanced where mechanisms such as supervisor mode, rings 
etc. already exist. However, the advantages of simplicity and modularity 
previously discussed would make any attempts at certification of the 
multi-process page control much easier. For example, the places that read 
and write arbitrary pages are localised Mid easily identifiable, and few 
in number. 

Security between parts of the system is affected by the separate 
address space afforded each page control process. For instance, only the 
paging device manager process need be permitted to execute the paging 
device page replacement algorithm. Since the paging device used list is 
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used primarily for this task, we can also restrict access to the paging 
device used list to the paging device manager process. No other processes 
need access to this list. 

Separation of policy from mechanism is possible if the system offers 
rings as does Multics (or some other form of protection domains) (Sc75] . 
The address space of each page control process can further be divided by 
use of these protection rings. The programs implementing the mechanics of 
paging, e»g. reading or writing a page from or to disk, adding or removing 
a page frame from a list, gathering usage statistics* etc. can be placed 
in the moat privileged ring. The policy algorithms, e.g. deciding what 
page to remove from primary memory, execute -.^n a less privileged ring, and 
must call the inner ring procedures to get the information needed and to 
actually implement the decisions made. Thus the failure of the policy 
algorithms could never cause unauthorized use or modification of the 
information in the pages. The system could be certified without having to 
certify these policy components. (Failure of the policy algorithms could 
still result in denial of service.) 

To summarize, the separation of the parts of page control permitted 
by the multi-process design effectively allows extra "fire-walls" between 
pieces of the system and and between procedures implementing mechanisms 
from procedures deciding policy. 

3.4.4 Expandability 

Expandability encompasses two ideas. One has been mentioned in the 
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discussion of modularity and might batter be termed adaptability, namely 
the ability to add another manager process to t&a iwgMg system to 
manipulate another level of memory. The second aspect of expandability is 
the ability to increase the number of processes executing as core or 
pag ing device managers as the size «f ^tha computer -ays earn grows . 

In a generalised computer utility Wit& multiple processing units and 
large amounts of memo ry, a point will eventually be reached where a single 
core manager process will be unable to supply free ^main m esi o ry page frames 
fast enough, even if the 'c^^mmM^^ -iM^4tlMa^-^sKm4mi^ t ; ^Mc<it'^rith 
several processors there will be multiple near processes executing 
simultaneously, each taking' page faults and demanding page frames. In 
such a situation, the solution is to create additional core manager 
processes (or paging device manager processes) as needed to supply free 
page frames at a sufficient rate. -All of -itHe --ic&im manager processes would 
be identical, and follow the algorithm ■oiMtgaaM 3.2* 

This design would be rather inefficient if ctasdgiobal locking 
strategy used by Multics is employed. The aalti«.process design, however, 
enables elimination of this lock by structuring the paging system's data 
bases into distinct parts, each of which needs to ibeacceased only by a 
single process (or type of process, e.g. if there are multiple core 
manager processes) . This would significantly decrease the interference 
among processes, producing a corresponding increase in system efficiency. 
This issue is considered in more detail in Chapter 5. 

To conclude, the multi-process design offers advantages in 
simplicity, ease of understanding, increased functional modularity, 
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enhanced user and system security, adaptability and expandability. The 
implementation described in the next chapter demonstrated that these are 
not just theoretical benefits but offer practical advantages as well. 
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CHAPTER 4 
A Multics Implementation of Multi-process Page Control 

4.1 The Multics Implementation 

Many readers will doubtlessly be strongly tempted to skip this 
chapter; we urge this temptation be resisted. Although the topic of this 
chapter is an actual implementation on the Multics system of the 
multi-process paging system presented in section 3.1.1, the emphasis is 
not on the details of Multics or the particular implementation of a paging 
system. Rather, the emphasis is on the insights gained into the design by 
its implementation. There are always problems arising in implementing a 
system that are not apparent from the design of the system. The purposes 
of implementing a real multi-process paging system were to demonstrate the 
validity of the design, determine if the system's theoretical benefits 
were manifested in practice, and to measure the performance of such a 
system. 
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4.1.1 Size and Scope of the Implementation 

To give some idea of the size of the system implemen.ted , the standard 
Multica page control consists of 28 modules written in assembly language 
and PL/1. These total approximately 4700 source statements, 36QQ in 
assembly language and 1100 in PL/1, which compile inso almost 11,000 lines 
(words) of object code. To implement the multi-process design, extensive 
changes were necessary. These changes are summarized in Appendix A, which 
lists the modules in the Hultics page control, that were changed or 
deleted, and the modules that were added. Appendix B lists the program 
modules required for the multi-process page control, for ease of 
implementation, the entire multi-process pag^a control was written in PL/1 
except where already existing components written in assembly language were 
used with little or no alterations. The size of each of the modules in 
source statements is also listed in Appendix B, and the size of the object 
code for each program. Excluding minor changes in existing modules and 
some changes to the scheduler needed to enable implementing page control 
as system processes, approximately 1700 PL/1 statements were written. The 
total size of the 32 modules comprising multi-process page control was 
roughly 3700 source statements, 1500 in assembly language and 2200 in 
PL/1. Note the number of PL/1 source statments doubled while the number 
of assembly language source lines was reduced by more than half. Because 
of the large increase in PL/1 source lines, the resulting modules compiled 
into slightly more than 13,000 lines (words) object code. This increase 
in size was due to the effect of writing the programs in a higher level 
language. 
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The structure of the implemented system Was identical to that 
illustrated in Figure 3.5. Both system processes, the core manager 
process and the paging device manager process, were driven by control 
procedures named "core manager" and "pd manager" respectively, these 
programs received wake up signals from other processes, determined what 
action to take as a result of those signals, called the necessary routines 
to accomplish that action and then signalled the completion of that action 
to any waiting process before blocking the system process. A more 
specific idea of how these processes work may be gotten from Appendix G, 
which contains some of the actual PL/t source programs for the 
core_manager and pd_manager modules, for completeness, comparable code 
from the third part of the system, the page fault path, is also included. 
This is the code that runs as part of the user process and is responsible 
for resolving page faults. 

4.1.2 Differences of the Implementation from the Model 

There were several points in the actual implementation where it was 
found necessary to deviate from what the model implies. One of the most 
significant of these was in the mechanism used to implement the core and 
paging device manager processes. The model does not differentiate between 
the system processes used to implement the core manager and paging device 
manager and the typical user process except in the functions they perform. 
In practice however, they may need to be implemented differently in order 
to obtain the efficiency and responsiveness required for system functions. 
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Additionally, the system processes must be able to operate without taking 
page faults, since they are used to implement page faults. 

Hence a special type of process was used to implement the core 
manager and paging device manager processes that were simpler and involved 
less overhead than a full Multics process. All procedures, tables, and 
temporary variables used by the core and paging device manager processes 
were fixed permanently in main memory. The processes also lacked the 
ability to add new segments to their address space, but this is not an 
ability needed by the page control processes anyway. 

The manager processes were also restricted from using the full 
interprocess communication mechanism of pities, because to permit them to 
use this facility would have required much more code and data be kept in 
main memory permanently. Instead, less powerful primitives were used 
which allowed processes to wait on events and signal the occurrence of 
events but did not allow Interprocess message sending. The use of these 
primitives, which were already part of the standard Multics system, had 
some performance implications because of their interaction with the 
Multics scheduler. Therefore, a special set of primitives was implemented 
and used only for waking up the memory manager processes. These 
primitives insured that once either of the system processes was ready to 
run * it was started as soon as possible. 

Another difficulty involving the wait primitive arose from the 
restricted environment a process operates in after a page fault. At this 
time, the faulting process cannot take another page fault, thus it must 
run on a wired stack. Multics does nottproyide a wired stack on a per 
process basis, but rather on a per physical processor basis. In a 

83 



situation Where a process needs a wired stack, It uses the wired stack 
(the "prds", or processor data segment, la Multice terminology) associated 
with the physical processor currently executing the process. This has 
severe consequences for the waiting operation. If a process surrenders 
the processor while using the prds as a stack, its stack history Is lost. 
The next process to run may overwrite the prds stack, and even if this 
could be prevented the process may run on is different physical processor 
(with a different prds) when restarted. 

The result of this restriction IS that if a process resolving a page 
fault must wait in a manner which requires abandoning the processor, it 
must do so at a point where it has no stack history on the prds. This 
situation arises in the implemented multi-process page control when a 
faulting process calls the main memory page frame allocator, who discovers 
there are currently no free page framea^ At J thia point fche core manager 
is signalled to free more page framea, bat' the fial ting process must wait, 
blocking itself and surrendering the processor. If the 5 iault£hg process 
did not give up the processor, the core manager proceed ; might 'Sever be 
able to run (e.g. in a single processor sys tern)" . thus the stack history 
at this point must be lost. ThiS is hdt too SevSre^ "since nothing has 
really been done up to this pdint dther th^deterlinifig what 3 pmge caused 
the fault. The mechanism used to solve this problem IS to have the wait 
primitive note the process is running on the prds, and restart the process 
by repeating the instruction that caused itit page fiuit whin the process 
is unblocked . This same action, repeating the faulting instruction; is 
also used to restart a process waiting for the completion of a read 
operation to bring a faulted on page into core . In- the first* case, since 
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the fault has not been resolved, the page fault code £s, Invoked again, but 
this time there should be a page frame available. In the latter case, the 
fault has been successfully resolved, and the process continues merrily on 
its way. 

To summarize, the implementation differences were due primarily to 
the simpler type process used to Implement the core and paging device 
manager processes, which imposed some restrictions on the functions these 
processes could perform, and to t:he strategy used on Multice for 
implementing a wired stack. The other differences from the model due to 
segmentation are presented in section 4.2, and result in adding extra 
functions required to deal with segmentation to the job of the system 
processes. 

4.1.3 Performance 

To compare the performance of the multi-process paging system with 
the standard Multics paging system, a system benchmark was run using both 
systems. A slight change was made to the standard system in order to 
obtain more meaningful results for comparison. The reason for this change 
was the larger size of the multiprocess page control system. Nine 
additional pages of memory were devoted to permanently wired system 
programs and data in the multi-process page control. This meant that the 
primary memory available for holding user pages was reduced a 
corresponding amount. So that the size of main memory usable for paging 
by user processes would be comparable in both cases, nine additional pages 
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were wired in the standard system and left empty. 

This modification did not make the size of the pageable memory 
exactly equal on both systems. The multi-process page control keeps a 
free list, and the number of frames on this list varies constantly as the 
core manager adds page frames and faulting processes request them. Each 
page frame on the free list reduces the amount of available memory 
available for paging; if, on the average, two page frames are on the free 
list, the effective pagable memory has been reduced by two pages. When 
the benchmark was run, the core manager was set to keep between four and 
eight page frames free. (That is, when awakened, the core manager would 
keep freeing page frames until there were eight; the allocating procedure 
would wake up the core core manager when the number fell below four.) A 
very conservative estimate is that on the average three pages were on the 
free list. To compensate for this effect, another three pages were left 
empty and wired when running the benchmark on the standard system, for a 
total of twelve (the previous nine pages due to the increased wired code 
and data plus three to compensate for the pages on the free list) . 

The results of running both systems are summarized in Figure 4.1. (1) 
The multi-process page control system took 8.7% more page faults. The 
increase in page faults is accounted for by three effects. The first of 
these is the inability of the adjustment described in the preceding 
paragraph to make the effective pageable memory exactly equal for both 
systems. The second effect is due to differences in the algorithms used 



(1) While useful for comparison, these numbers were obtained in a special 
test environment and do not reflect the normal operating performance of 
Multics. 
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Standard MIT Multi-process 
System page control 

Actual Estimated 



Number of 
page faults 



60,261 65,504 65,504 



Average time 
to process a 
page fault 
(microsec .) 



1973 



2043 



1226 



Total CPU time 

attributable 

to paging (sec.) 



119 



307 



184 



CPU time spent (sec): 



processing 
page faults 



134 



80 



in core manager 
process 



141 



85 



in paging device 
manager process 



32 



19 



Figure 4. 1 



Performance of multi-process page control 
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for page replacement, specifically in when pages are replaced. Since the 
multi-process page control evicts pages before the system runs out of free 
page frames while the standard system only replaces pages when no free 
page frames are left, the pages held in memory at any given time may 
differ. Given the same execution sequence, changing the pages In memory 
will cause a different fault pattern and fault rate. Third, the average 
time to resolve a fault changed, $s Figure 4.1 shows. Any, difference in 
the time required for any event in a multiprocessing environment can alter 
the pattern of page faults by changing the contents of t^he memory. 

Although the average time spent processing a page fault r$m4ined 
relatively constant, these times are measured differently and are not 
directly comparable. Since page replacement in both main memory and the 
paging device is done at page fault time in the standard page "control , 
that time is included in the time to process a, ,P4g,£, fan^t , while this time 
is attributed to the core manager or paging device manager in the 
multi-process scheme. Thus one would expect the time spent processing a 
page fault to be much less for the multi-process implementation. 

The fact that the time is not smaller is due to the overhead of PL/1. 
In the standard system, all but a small fraction ol the code that runs at 
page fault time is written in assembly language. In the multi-process 
system the situation is reversed, with the large majority of the programs 
written in PL/1. There are two sources of overhead attributable to PL/1 
at work here. One is the fact that in general, algorithms written in 
assembly language are shorter and execute faster than the same algorithm 
written in PL/1. (In cases, the object code generated by PL/1 may be a 
factor of two or three larger.) Second, and more important, is the 
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overhead Involved in making a PL /l external procedure call. In the 
assembly language version, subroutine calls and returns are made via a 
single transfer instruction. A more complex sequence is required in PL/1 
so that the stack and the PL/1 environment are managed properly. 

In Multics measurements have shown that a PL/1 external procedure 
call requires on the average 67 microseconds. This figure is for a call 
with no arguments; each argument passed adds approximately two 
microseconds. The path followed after a page fault occurs in the 
multi-process page control involves twelve external calls. Using 70 
microseconds as an average time for one external call (i.e. assuming one 
and a half arguments per call) , this means that a total of 840 
microseconds of the average 2043 microseconds required to resolve a page 
fault, or about 41% of the total, is due to the procedure call overhead 
alone. 

A similar calculation shows that twelve PL/ 1 calls are also executed 
in the course of freeing one main memory page frame. Measurements from 
the benchmark show, assuming that all of the tine spent by the core 
manager was spent freeing page frames (not strictly true, see sections 4.2 
and 4.3), that an average of roughly 2100 microseconds was required to 
free one page frame. Again, the PL/1 call overhead was 840 microseconds, 
or about 40%. 

Using this figure of 40%, and reducing the amount of time spent by 
each component of page control in the actual benchmark by 40%, gives the 
results shown in Figure 4.1 as the estimated performance of multi-process 
page control. This shows the estimated performance improvement if all the 
external PL/1 calls were changed to internal procedure calls. 
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There is a smaller effect doe to the repetition of certain steps in 
each PL/ 1 program, for example, pointers to d«ta bases may hereto be 
initialized in several procedures^ instead of just once as in the' assembly 
language version of page control. Another factor in the increased 
percentage of processor time attributable to the paging system is the fact 
that some operations included in -the- 'total time •efe#i$#d'-"to' the 
multi-process page Control are not counted teWerdS the overhead of the 
standard Multies paging system *{a^ ? *efc*i*«ir*i:i i a«a , -#i'3) . n While it is 
extremely difficult to estimate' the effect -#f these ■'4H6-' factors on 
performance, their el imirtat to* Wight **»iilt i* a f«#€Se'r I^rovement of 
5-10% over the estimates in figure 4.1. 

Achieving a performance "level equalling "©r improving upon the current 
Multies page control was not a goal of the teat implementation. However 
it is the author's belief that the multi-process implementation is not 
inherently less efficient; it could be made much aot^e cottparable if 
appropriate programming style was used, Such as only using internal 
procedure calls, which Haiti*** PL/1 implements v^e^Wfficlfently, and using 
global variables'. .??■■.■'.:. 

4.2 The Interface with Segment Control 

Multies is a segmented system and has the concept of "active" and 
"inactive" segments as discussed in section 2.1.4. This necessitates 
some extra function in page control, which leads to a more complex core 
manager and paging device manager than would otherwise be the case. The 
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extra functions that must be added to page control, and the complications 
these extra functions introduce, are examined in the next two sections. 



4.2.1 Necessary Segment Control Functions 

The chief area of contention between segment control and page control 
is the page table. Page tables are allocated by segment control, but must 
be maintained by page control. When segment control wants to perform an 
action which may affect the page table words, it must call upon page 
control. In the case of the multi-process design of this thesis, that 
means the core manager and paging device manager processes. 

There are four segment control functions which affect page table 
words. These are: 1. Activating a segment, which requires the file map 
containing the permanent disk addresses of the segment's pages be copied 
into the just allocated page table. 2. Changing the size of the the page 
table (a "boundsfault" in Multics) , which requires the contents of a page 
table be copied into a new, larger page table when a segment grows. 3. 
Deactivation, which flushes the segment's pages back to disk. A. 
Truncation, which deletes some or all of the pages of a segment, requiring 
the deletion of all copies of those pages in all levels of the memory 
system. 

Of these four, only two require intervention by the core manager and 
paging device manager processes. Activation does not, because a process 
cannot take a page fault on a segment until the segment has been assigned 
a page table; thus segment control can be responsible for initializing the 
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page table. Similarly, when a process extends the size of a segment 
causing a larger page table to be allocated for it, the process can copy 
the page table itself, since no memory is allocated or deallocated the 
core and paging device processes need not be involved. On the other hand, 
both deactivation and truncation explicitly require memory deallocation, 
and thus the assistance of the memory manager processes, whose Job it is 
to do memory deallocation. 

Deactivation requires the "cleaning up** of any pages of the segment 
remaining in memory. Pages of inactive segments cannot stay in main 
memory or on the paging device because there will no longer be page table 
words for these pages. Thus the paging device manager must perform read 
write sequences on any pages of the segment being deactivated that reside 
on the paging device. Any pages in main memory must also be evicted, and 
the core manager must insure that the evicted pages are not put back on 
the paging device. 

Truncation is somewhat easier in one respect, for no i/o need be 
done. Since the pages are being deleted, copies residing on the paging 
device or in main memory may simply be discarded, and their page frames 
claimed and added to the appropriate free list. Any disk copies of the 
deleted pages must also be thrown away, and the disk records they occupied 
returned to the file system for future reuse. 

4.2.2 Complications Introduced 

Since truncation and deactivation of Segments both potentially 
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involve main memory and paging device memory deallocation these operations 
are logical candidates for implementation in the core and. paging device 
manager processes. Doing so necessarily complicates these processes as 
they no longer perform a single task. They must now be awakened when a 
segment is to be truncated or deactivated to perform the necessary steps. 
This means when the core or paging device manager is started up , they must 
determine why they were awakened, and perform the correct function. Note 
also that just sending a wakeup signal is insufficient; more information 
is required in the case of a truncation or deactivation. In both 
instances the segment on which the operation is to be performed must be 
specified; additionally for a truncation which pages are to be deleted 
must also be indicated. 

Thus the core manager and paging manager become message receivers, 
responding to interprocess messages from other processes to free page 
frames, truncate specified pages or clean up designated segments. When a 
process wishes to truncate a segment, a message is first sent to the 
paging device manager process, which deletes any copies of pages of the 
segment on the paging device, returning the page frames bound to those 
pages to the free pool. Upon receiving notification of the completion of 
this part of the task from the paging device manager* a message is sent to 
the core manager process asking him to finish the job. The core manager 
deletes any copies of the segment's pages in core, adding their page 
frames to the pool of free main memory page frames, and signals that the 
truncation is complete. Deactivations are handled analogously, with pages 
being returned to disk rather than deleted. 

An alternate strategy is possible and was contemplated for some time. 
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The truncation and deactivation functions could be performed by the i user 
process, rather than asking the system processes to perioral these tasks. 
This has the advantage of keeping the eo»e and' paging device manager 
processes simple, but distributes part of the function of page control 
back to the user. This implies deallocation of memory may be going on in 
more than one place at a time. There is; clearly ■■:■.& erada*>off here ---bet-wean' 
making the System process more complex and* shoving system functions back 
into the user processes. In the final anaiyeiw it -waa felt the prime 
consideration was to collect ail ehap«g«eotttSdlojM r *** on8!iinfco a: ■single 
process. 

4.3 Other Page Control Functions 

In section 2.1 two other page control functions were discussed: 
memory reconfiguration and memory wiring. In the context of the system 
processes, memory reconfiguration amounts to adding or deleting page 
frames from the supply that may be allocated; memory wiring means 
guaranteeing certain pages will not be removed from main memory. These 
tasks, though of secondary importance, are also within the province of the 
core and paging device manager processes. 

The steps involved in adding or removing memory have already been 
described in discussing memory reconfiguration (see section 2.2.3). These 
steps are carried out by the appropriate memory manager process in 
response to a request from the process performing the reconfiguration. On 
completion, the reconfiguration process is notified. Hence reconfiguring 
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page frame 8 presents no additional complications, merely increasing the 
number of functions the paging device manager and core manager processes 
must perform. 

Wiring pages (section 2.2.4) was implemented as a system procedure 
called by user processes. The only effect upon the core manager was to 
include a check for wired pages when choosing pages for removal. 
Implementing wiring in this fashion requires no action by the core manager 
process and was done largely for convenience, as the currently used wiring 
procedure could be used unchanged. Wiring could be done by the core 
manger process just as easily; becoming an interprocess call instead of a 
simple procedure call. Absolute wiring* however* must be implemented by 
the core manager process since deallocation of some pages may .occur and 
special allocation techniques may be necessary. This adds an extra 
function to the core manager process. 

Un wiring pages can be implemented in the core manager process or 
simply by procedure calls. The choice is largely one of convenience. To 
reduce the amount of code rewritten for the test implementation, unwiring 
was implemented without the intervention of the core manager process. 
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CHAPTER 5 
Eliminating the Global Page Table Lock 

One of the major benefits of having multiple processes implement the 
paging system is the ability to simultaneously execute two processes 
performing page control functions. This parallelism in the performance of 
page control functions is lost, however, if a global lock such as used in 
Multics (Section 3.2.1) is used to prevent data base contention. Since 
only a single process may have control of the lock, only one page control 
function may be executing at any moment. This Of course prohibits 
handling several page faults in parallel. 

In this chapter a strategy for splitting the global page table lock 
will be developed. By Identifying the processes using each page control 
data base and which data bases each process may reference simultaneously a 
strategy using individual data base locks can be implemented. Such a 
scheme allows full advantage to be made of the multi-processing capability 
of the combination page control presented in section 3.3.1, including 
simultaneous handling of page faults. 
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5.1 The Strategy 

One reason the global lock is used in MUltics is that all page 
control functions are performed at page fault time. Thus a process 
handling a page fault will first access the paging device used and free 
lists, then the core used and free lists, etc. Since every user process 
taking a page fault must access all the page control data bases, all the 
data bases are subject to data base contention. The global lock protects 
everything, even though some data will no longer be referenced or are not 
yet needed. 

Hence a first step in dividing the global lock is determining which 
data bases are subject to contention. Clearly if a data base is accessed 
by only a single process that data base need not be protected. Figure 5.1 
presents this information for the page cqutrol data bases. For example, a 
user process handling a page fault would have to access the core free list 
to obtain a free page frame to allocate to th4 faulted- on page. Clearly 
the core free list must also be referenced by the core manager process 
since the core manager is the process responsible for adding page frames 
to the free list. 

Not surprisingly, all the data bases are used by more than one 
process. Pages are referenced not only by both of the system processes 
but also by user processes faulting on the pages. The other four data 
bases are each accessed by two or more processes. To allow parallel 
execution while preventing contention, access to these data bases must be 
arbitrated in some fashion. A lock on each list is the obvious solution. 
Thus we assume a lock is associated with each of the four lists; the lock 
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Processes accessing page control data bases 
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must be set before access to the corresponding list Is allowed. 

Similarly a lock will be associated with each page, and the lock must 
be locked before operations may be carried out on the page (e.g. 
resolving a page fault). This of course is not new; Multics already has 
such a per page lock. 

With multiple locks, precautions are necessary to preclude system 
deadlocks. Thus a second important step in eliminating the global lock 
and replacing it with distributed locks is determining under what 
conditions a process needs to lock more than one data base. If such 
conditions never occur, a system deadlock cannot occur due to two 
processes waiting for locks held by one another. 

Situations where a process needs access simultaneously to two objects 
protected by locks occur frequently, as shown in Figure 5.2. For 
instance, any user process taking a page fault must lock the faulted on 
page while the page is read in, and while the page is locked the process 
must access the core used list to add the page to the used list. 

At this point the next step is to develop a locking protocol defining 
allowable actions on the locks which guarantee system deadlocks cannot 
occur. We will use the standard Multics avoidance strategy which involves 
a "locking hierarchy" and "waiting rules". The locking hierarchy states 
the order in which locks are locked. This insures that if two processes 
both need locks A and B then both processes lock these locks in the same 
order, preventing one process from locking A and waiting for B while the 
other process locks B and waits for A. The waiting rules state when a 
process may wait for a lock without giving up the processor (i.e. when 
waiting may be "busy" waiting, done by repeating the attempt to set the 
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lock until successful, as opposed to non-busy waiting, implemented in 
software and requiring the process to surrender the processor) . Thus a 
process must not be allowed to surrender the processor (block, itself) with 
a lock set if some other process might perform a busy wait on that lock. 

It is not difficult to determine what the protocols must be. From 
Figure 5.2 it can be seen two levels of locks exist — the locks on the 
four lists, and the page locks. A process needs to have only one of each 
locked at a time. Clearly, the protocol must require locking the page 
lock first. For example, after a page fault, the process taking the fault 
must lock the page before accessing the core free list to allocate a page 
frame. This is to insure another process has not already begun allocating 
a page frame to the page. Hence we have the following rule defining the 
locking hierarchy (order of locking): 

A page must be locked before attempting any operation on the page, 
and before that page may be added or removed from the core used or free 
list, or the paging device used or free list. 

The waiting strategy is largely determined by the relatively long i/o 
times. That is, pages must remain locked while read and writes from and 
to the paging device and disks are in progress. Hence pages will be 
locked for long times, making busy waiting on page locks hopelessly 
wasteful of processor time. (In addition, a process looping on a page lock 
could prevent the process that wished to unlock the page from ever 
executing and thereby freeing the lock.) Thus a process wishing to wait 
on a page's lock must block itself, giving up the processor. Note the 
hierarchy rule given above implies a process waiting on a page lock cannot 
possibly have one of the four used/free lists locked. 
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There is a further question of what to do if a process needs to lock 
several pages of the same segment simultaneously. Such a case may occur 
in performing such functions as deactivation or truncation (section 4.2.1) 
that operate on all pages within a segment. Usually such a problem may be 
solved by locking each page in turn, performing the necessary actions on 
the page, unlocking it and continuing with the next page, etc. In Multics 
this method is adequate, however if it is not sufficient, locking the 
pages in order by page number imposes the necessary lock ordering to 
prevent deadlocks. 

For the locks on the used/ free lists, busy waiting is not only 
possible but desirable. These lists need only be locked for several 
instructions, as long as required to add or delete an entry. Thus wait 
time should be minimal. Note assuming busy waiting here implies a process 
never gives up the processor with one of the four lists locked; that is, 
the add/remove operations must be non-interruptible. 

To summarize, the rules for waiting on locks are: 

1. A process must block itself while waiting on a page lock. 

2. A process may block itself with a page lock locked. 

3. A process may busy wait on the lock associated with any of the 
four lists of Figure 5.1. 

4. A process may not block itself with one of the four lists of 
Figure 5.1 locked. 

The last two rules are enforceable by requiring all additions and 
deletions to the lists be made using system functions. This has the added 
consequence that the callers need not even be aware of the existence of 
the locks or the rules. The primitives themselves are written to obey the 
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protocols. Indeed, if the used/free 1 1st a could be l*plef»eoite4 without 
locks by carefully choosing their structure, the: last two rul*# would; be 
unnecessary. Thus the iaple»entation would be t*ansp»fent to th« user of 

the primitives. 

In other cases, following these rules nay require knowledge of the 
implementation of certain system ftjnot ions. In paeticwlar, section 3.3.1 
discussed implementation of the routine that allocates free page frames. 
The approach chosen involves blocking the calling process if there are no 
free page frames. Processes using such an allocation routine must be 
aware that they may block themselves by calltag the.sllocation routine, 
and ensure this would not violate the locking rules. 

How do these rules manifest themselves in practice? Consider the 
core manager while attempting to free page frames. He attempts first to 
lock a page. If the core manager fails in this attempt to lock the page, , 
he merely iriesi another ;page oh the used list. (If he really must have 
this particular page, by the rules above he must go blocked.) However 
assuming the core manager succeeds in locking the page, he may then 
examine it to decide if it is a good candidate for removal. If the core 
manager decides the page should be replaced, he removes it from the used 
list (locking the core used list while doing so), gets a paging device 
page frame to write the page to if the page is not already on the paging 
device (locking the paging device free list momentarily) and starts 
writing the page. The core manager may then block himself until the write 
completes, at which time he adds the paging device page frame to the 
paging device used list, unlocks the page, and finally adds the now free 
core page frame to the core free list. 
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A process taking a page fault blocks himself if he cannot lock the 
faulted-on page. When the page is unlocked, the process will be awakened 
and can try again. When the process succeeds in locking the page, he can 
then determine if the fault still needs to be resolved. 

Adopting the scheme outlined above will indeed permit not only 
simultaneous execution of both system paging processes (or multiple 
instances of system processes) but also parallel execution of user 
processes handling page faults. As long as user processes do not attempt 
to resolve faults on the same page they will not interfere with one 
another. Waiting for data bases is minimized because the data bases 
(lists) need remain locked only while items are added or deleted. 

5.2 Locks on Segments 

The locking strategy presented in the preceding section is 
insufficient in a segmented system such as Multics. This is because 
certain information about each segment is maintained by page control. For 
example, the number of pages of the segment that are currently in main 
memory is one such item of information. In a per page locking scheme, 
there is no way to protect such data without additional mechanisms. For 
example, a process faulting on a page will need to increase the count of 
the number of pages in core for the page's segment; if simultaneously the 
core manager process is evicting a page of that segment it must decrement 
the number of pages in main memory by one. A race condition may develop 
leaving the number in an inconsistent state. 
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Another example of per segment information which page control 
maintains in cities is "quota". In Multics* quota is an upper Limit on 
the number of pages the segments of a directory may con-tain. (Multics has 
a hierarchical file system where all segments are cataloged in special 
directory segments. A directory's quota restricts the amount of storage 
that may be consumed by segments within that directory.) Page control 
must keep track of the quota as well as the number of pages used by the 
segments in the directory. A full discussion of quota is postponed to the 
next section. 

Thus in practice a segmented system would need, to add another level 
of locks, namely per segment locks, to protect the information associated 
with each segment and manipulated by page control. It should be 
emphasized that although the term segment lock is used* these locks are 
used only by page control and not by segment control. Segment control may 
need to use some sort of lock for proper implementation of its functions; 
however , the segment locks discussed here are not intended for such use. 
The per segment locks discussed here are not locks on the segment, but on 
the page control information associated with each segment. Implemented 
beneath segment control, segment control should not be aware of their 
existence. 

How should these per segment locks be incorporated? One solution 
would be to use the per segment locks in place of the per page locks. In 
this scheme, access to all of the pages comprising the segment as well as 
to the per segment information, would be controlled by the segment lock. 
Having a single lock control all the pages in a segment means that once a 
process has locked a segment while processing a page fault, no other 
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process could perform any actios on that segment (eig. fault on another 
page, remove a page of the segment from core) until the page fault had 
been completed. Note, though, this restriction would be advantageous 
under certain circumstances; i.e. when performing a segment operation such 
as truncation or deactivation which operates on all of the pages of the' 
Segment. In such cases locking the segment lock allows the entire 
operation to be performed, where in a per page locking scheme each page in 
the segment must be locked. 

A better strategy is to implement the segment locks beneath the page 
locks * in the same manner as the locks on the used and free lists. The 
segment locks, like the locks on the lists, need oaiy be locked for a few 
instructions while the per segment information is updated. The rules 
applying to the locks protecting the used and free lists must also be 
observed for the segment locks. That is, a process may lock a segment 
only after the page <if any) the process is operating on is already 
locked. Segment locks can be busy waited on, but a process must unlock 
any segments it has locked before abandoning the processor. 

This strategy of implementing the segment locks dees not conflict 
with the implementation of the locks on the free and used lists because a 
process never needs to have one of the lists and a segment locked 
simultaneously. (If such a situation did arise, appropriate ordering 
rules would prevent deadlocks.) Happily the addition of per segment locks 
does not place any restrictions on what page control functions may be 
executing in parallel. Several user page faults may still be resolved at 
once; if by chance page faults on two pages of the same segment are being 
handled, at worst one process will wait momentarily -While the other has 
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the segment of interest looked. 



5.3 tfultics Compile at lone 

The per segment locking strategy just described for Multlcs has not 
been Implemented. This section discusses two complications which 
prevented the segment locking scheme from being added to the multi-process 
implementation of page control on Multlcs in the time available. 

The first problem is ensuring that the global page table lock is not 
being used in obscure ways by programs knowledgeable of its function to 
protect data against contention. In fact, one good argument for removing 
the global lock is to force such assumptions to be made explicitly. 
Knowing that a global locks protects many data bases makes it very 
tempting for a programmer to take advantage of the global lock by using a 
certain location in a data base as a temporary because he "knows" the 
global lock protects that location against any other use while he has the 
lock set. 

As an example of a hidden use of the global page table lock* consider 
the following from Multlcs: Requests to the bulk store paging device for 
i/o are queued as they arrive for actual execution later. The queues kept 
are protected only by the global page table lock. That is, the code is 
not written to allow several processes to be accessing the queues 
simultaneously. Removal of the global lock could therefore result in 
errors in these queues unless a separate lock were added to protect the 
queues . 
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Unfortunately such assumptions are not usually documented. Huey are 
not discovered until such tine as they result in a fatal system error of 
some type. 

The second source of difficulty is the Multics implementation of 
quota. Actually, the problem is caused by the interaction of three 
features: quota, the hierarchical file structure, and dynamic segment 
growth. There are two numbers associated with each directory in Multics; 
the quota or maximum number of pages (disk records) the segments of the 
directory and inferior directories may occupy; and the records used, which 
is the actual current count of storage used. A directory may be specified 
as having no quota, in which case any quota placed on superior directories 
is the only constraint on the directory (e.g. if directory beta is 
immediately inferior to directory alpha and assuming alpha has a quota of 
100 and beta has no quota, segments in beta can never occupy more than 100 
pages) . 

The crucial factor is that Multics allows dynamic growth of segments. 
By merely referencing a non-existent page of a segment a process can 
create that page. Referencing the non-existent page causes a page fault, 
and page control creates a page of zeroes. At this point trouble arises, 
for this creation must be reflected in the records used count of the 
segment's parent directory. Thus, while the segment the page fault was on 
is locked, the segment's parent directory must also be locked to update 
the records used count. If the records used is less than the directory's 
quota, the creation is valid. However if the records used would now 
exceed the quota, the page may not be created and page control must notify 
the faulting process of an error. The situation is complicated if the 
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segment's parent directory does not have a quota Unit, in which case the 
directory's parent must be checked, etc. until a superior directory is 
found that does have such a limit. At each step up the hierarchy, the 
directory (which is, of course, a segment) must be 4*cked in order to 
increment its records used eount. When a directory with a quota is found, 
the check can be made. 

The difficulty arises in locking all, the segments at. the same time. 
They must be locked, because some very important per segment, information 
is being changed. Since locking the directories i* always from the bottom 
up (in terms of the hierarchy tree) , there is no danger of a deadlock,. 
But recall that the previously presented locking rules forbid a process 
from blocking itself with any segments locked. Hence if at any point, a 
process cannot lock a particular directory in its search for a directory 
with a quota limit, it must unlock all locked segments and block itself, 
starting over again when awakened. 

Of course, when pages are deleted (e.g. by a truncate operation), the 
records used must also be updated in a like manner. Multics further 
complicates matters by always deleting pages of zeroes. That is, if a 
program or data segment has an entire page of zeroes anywhere, that page 
of zeroes is automatically deleted each time it is removed from main 
memory (and recreated upon next reference) . This is done on the 
assumption creating the page is faster than reading and deleting is faster 
than writing, and that disk space will be saved. There is an impact of 
this decision on quota, in that such a page of zeroes is only charged 
against quota when actually in core. 

The implementation of quota and the deletion of zero pages complicate 
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the page control algorithms, and especially the locking strategy, 
tremendously. Various simplifications are possible; for example: do not 
allow segments to grow dynamically, or allow them to grow dynamically but 
insist a maximum size be specified and always count that maximum size 
against the quota (thus no change is needed in records used when a page of 
zeroes is created) . Explicit operations could be used to change the size 
of a segment instead of having page control do the work automatically. 
Unfortunately all such solutions have noticeable effects on the system, 
and would change its functionality. The issue of quota, its 
implementation and impact on the system, is quite complex and is still 
being studied. 
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CHAPTER 6 



Conclusion 



This thesis has presented a design for a system that implements a 
virtual memory using asynchronous, cooperating sequential processes. 
This design was demonstrated to offer significant potential advantages 
over other designs in terms of simplicity, modularity, system and user 
security, and degree of expandability. 

The proposed system was built and tested on the Mul tics system. The 
implementation showed the feasibility of the design and the validity, of 
the claimed advantages. 

It is felt that the technique of exploiting parallelism la performing 
system tasks by Implementing those tasks as several cooperating sequential 
processes is extremely important and powerful. That this method can be 
made to work in practice and lead to operating systems whose design is 
simpler and better structured is the most significant result of this 
thesis. 

The Multics system offers several additional examples of places where 
a system process could be incorporated to perform tasks currently done by 
the user process. For example, section 2.1. A mentioned that page tables 
are multiplexed among segments in the same fashion that page frames are 
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multiplexed between pages. Currently, when a segment Is activated, If no 
page tables are available, the user process must execute a "replacement 
algorithm" which frees up a page table by deactivating some other segment. 
The similarity with page replacement is obvious, and a system process 
could be used to keep a free pool of page tables in the same fashion as 
the core manager does for page frames in the design presented here. 

There is much that still can be done in this area. The test 
implementation could be greatly improved if the Multics scheduler were 
redesigned to truly implement system processes that could be scheduled 
without the considerable overhead of the current scheduler. The 
per- segment locking strategy proposed in section 4.4.2 would greatly 
improve the performance of multi-process page control la multiple 
processor systems. 

Finally, it is hoped the success of the implementation reported here 
will encourage other such attempts, perhaps along the lines of Hoare's 
proposed system or Saxena and Bredt's system, to see if the dlff lenities 
concerning those systems mentioned in sections 3.3.2 and 3.3.3 can be 
overcome. It would be interesting to compare implementations of such, 
systems, or newly proposed systems, with that given here. 
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APPENDIX A 
Changes made to standard page control 



Changed Extensively 

page_fault 

post_purge 

pc 

pc_abs 

pc_contig 

pc_wired 

freecore 

delete_pd_records 

wired_plm 

evict_page 

page_error 

initial ize_dims 

init_sst 

pxss 



Modules Added 

page_fault_pll 

core_manager 

pd_manager 

read 

write 

core_free__list 

core_used_list 

pd_free_list 

pd_used_list 

utility 



Changed Slightly 

bulk_store__control 

disk_control 

free_store 

pc_trace 

wired_fim 

wired shutdown 



Modules Deleted 

pd_util 

get_disk_meters 
meter disk 
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APPENDIX B 
Components of Multi-process page control 



Name 

page__fault 

page 

device_control 

bulk_store_control 

pc_trace 

free_store 

read 

write 

evict_page 

page_error 

post__purge 

get_disk_meters 

disk_control 

pc_wired 

page_f ault__pl 1 

pc 

core__free_list 

core_used__list 

pd__free_list 

pd_used__list 

core_manager 

pd_manager 

pc_contig 

utility 

quo taw 

thread 

get_^trs_ 

pc_trace_j>ll 

pc_abs 

wired_plm 

dele te__pd__r ecord s 

freecore 





Source 


Object 


Language 


statements 


length 


aim 


560 


580 


aim 


28 


116 


pll 


136 


896 


L aim 


369 


386 


aim 


45 


68 


aim 


133 


138 


pll 


62 


318 


pii 


192 


956 


pii 


39 


142 


aim 


217 


349 


aim 


126 


126 


pll 


12 


22 


pll 


247 


1472 


pll 


70 


312 


pll 


32 


170 


pll 


294 


1740 


pll 


54 


290 


pll 


49 


232 


Pll 


53 


230 


pll 


40 


180 


pll 


282 


1224 


pll 


179 


724 


pll 


16 


80 


Pll 


62 


384 


pll 


85 


310 


Pll 


33 


128 


aim 


37 


88 


Pll 


85 


812 


pll 


17 


160 


Pll 


45 


162 


Pll 


75 


416 


pll 


14 


66 


aim: 


1515 




pll: 


2173 





3688 13,277 
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Components of standard page control 







Source 


Name 


Lanjjuafte 


statements 


page_fault 


aim 


1592 


page 


aim 


34 


device_control 


pll 


118 


bulkjstore_control 


aim 


369 


pc_trace 


aim 


45 


free_store 


aim 


133 


evict_page 


pll 


147 


page_error 


aim 


376 


post_purge 


aim 


145 


ge t_d i skjne te r s 


pll 


12 


d±sk_control 


pll 


247 


pc_wired 


pll 


71 


pc 


pll 


389 


pc_contig 


pll 


69 


quo taw 


pll 


85 


thread 


pll 


33 


get_ptrs_ 


aim 


37 


pc_trace_pll 


pll 


85 


pc_abs 


pll 


69 


wired_plm 


pll 


36 


delete_pd__records 


pll 


111 


freecore 


pll 


34 


meterjdisk 


pll 


72 


pd_util 


aim 


394 




aim: 


3423 




pll: 


1280 



Object 
length 

1616 

132 

134 

376 

252 

142 

168 

614 

146 

116 
1478 

254 
2144 

320 

310 

128 
88 

,812 

328 

152 

546 

140 
68 

402 



4703 



107§6*6 
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APPENDIX C 



Code from multi-fproces* page control 



The following code is taken directly from the implementation of the 
multi-process paging system implemented on liultic* as described in Chapter 
4. The procedure M page_fault_pll M is the code executed by the user 
process at page fault time; the procedures "corejwmager" and "pd^manager" 
are the procedures executed by the core and pd ^manager processes 
respectively. While some code has been omitted (chiefly lower level 
subroutines and segment operations such as deactivation, trucation, 
wiring, etc.), no other changes have been made; all the programs listed 
were actually run on the Multics system. 

The normal operation of the system is fairly straightforward for the 
most part and follows the ideas already presented. Page, faults are the 
event which drive the entire system. On th* occurrence of a page fault, 
the page fault code is invoked. After determining the page causing the 
fault, a call is made to allocate a free core pa«e frame. The allocation 
procedure is ultimately responsible for driving the core manager process, 
for when the number of free page frames falls too low, a wakeup is sent to 
the core manager. On receiving this signal, the core manager selects an 
in-use page frame to be replaced and writes the page held in the page 
frame out of main memory. After waiting for the write operation to 
complete, the core manager adds the now free page .frame to the free list. 
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The writing step may have several results, as an attempt will be made to 
write the page to the paging device. If a copy of the page is on the 
paging device and the page has not been modified, no write operation is 
necessary. But if the page is not yet on the paging device, or has been 
modified, a write must be performed. In the former case, a call must be 
made to allocate a paging device page frame, and this is the act which 
ultimately activates the paging device manager. When the <allocaj;iQn code 
notices too few paging device page frame* axe available, a wakeup signal 
is sent to the paging device manager. After receiving the. wakeup the pd 
manager chooses a used paging device page frame t& remove ; and performs a 
read write sequence if necessary (i.e. if the pagiwg device copy has been 
modified with respect to the disk copy of the page, or if there is no disk 
copy). When the read write sequence la finished, the page frame is added 
to the free list. 

Both the main memory replacement algorithm and? tha paging device 
replacement algorithm operate in a leas* recently used (WMJJ fashion. The 
Multics hardware keeps modified and used bits in the page tabl,e word as 
mentioned in section 2.2.2. Each used list is implemented as a doubly 
linked circular list of entries, with -a pointer Jfc© the least recently used 
item. This pointer identifies the first page frame examined when one is 
to be chosen for deallocation. 

The main memory replacement algorithm examines the used list until a 
page whose used bit is off is found. Any page looked at during this 
search whose used bit is on has the bit turned off. Once such a page is 
found, it is a candidate for removal. (Certain other checks are made, for 
example to insure the page is not currently locked because it is 
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undergoing a read or write operation.) As pages are examined, the pointer 
to the least recently used item is advanced so that after the page to be 
removed is selected the page frame immediately following it in the list 
(i.e. the first page not looked at) becomes the leas£ recently used page. 
When pages are faulted on and read in, they are placed immediately behind 
the page pointed to by the least recently used pointer; this makes them 
"most , recently" used. 

The paging device used list is managed in a similar way, however 
there are no used bits associated with paging device pages. Thus rather 
than searching for the first page on the paging device used list with a 
used bit off, the first page that is not currently also in main memory is 
selected for removal. The rationale for this decision is that the page is 
in use if in core, thus should not be removed fram the paging device. 
Note since the page is in main memory, sooner or later it will be evicted 
from main memory, and the eviction will be made easier and faster if the 
page is already on the paging device. When a page is read from the paging 
device to satisfy a fault, that constitutes a use of the page, so it is 
moved to the most recently used position in the used list. Similarly, 
when pages are first written to the paging device, they are entered into 
the most recently used spot in the list. 

The code that follows makes use of several data bases that are given 
rather cryptic names. The comments in the code often refer to these data 
bases. The list below explains the meaning of each abbreviation and the 
purpose of each data base. These data bases are defined by PL/1 
structures. In the actual code, a statement of -, the form "Zinclude est;" 
causes the PL/ 1 structure declarations for the data base "est" to be 
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■y--^sms^«v^.-. 



included In the source file by the : complieif isl collation time. 

1. ast - active segment table 

The active segment table contains one entry (an "aste*', or "active 
segment table entry") fdr each active segment itf the system, lach aste 
consists of all the page table words for pages of the segment plus ^he per 
segment information kept by page control such as segment length, quota, 
etc. 

2 . cmp - core map 

Each page frame is described by a "core map entry*' ("erne") in the 
core map. The erne contains the information associated *i th '--the' page' 
frame, e.g. a pointer to the page tab le Word of the page allocated the 
page frame. The core used and free listW ar*ilirily a lln1ee^ listl of 
cme's. 

3. pdmap - paging device map 

Each paging device page frame is describe 1 * by a ^pctae", or "paging 
device map entry", in a manner analogous to the cdre map entries. * 
Similarly, the pd used and free lists are linked lists of pdee's. 

4. ptw - page table word 

Each page of an active segment is described by a page table word 
which contains the current address of the copy of the page Highest in the 
memory hierarchy; i.e. a core address if the page is in core, otherwise a 
paging device address or if the page is not on the paging device, a disk 
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address. Used and modified bits, a lock bit, and a fault tag are also 
kept in the page table word. 

5. sst - system storage table 

The sst is the primary page control data base. It contains not only 
the core map, the paging device map, and the active segment table, but 
also all other page control variables and constants such as pointers to 
the beginning of the various free and used lists, the global page table 
lock, etc. A large portion of the sst is also devoted to metering 
information (number of page faults, number of read write sequences 
performed, etc.) . 
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